Main Content

Pig

About the Talk

October 30, 2009 5:45 AM

Georgia Tech Research Institute Conference Center

Georgia Tech Research Institute Conference Center

Massive growth in the size of business datasets leads many companies to Hadoop, an emerging architecture for parallel data processing. However, the migration path can be challenging, in part because MapReduce analyses use programming languages like Java and Python rather than SQL. Apache Pig is a high-level framework built on top of Hadoop that offers a powerful yet vastly simplified way to analyze data in Hadoop. It allows businesses to leverage the power of Hadoop in a simple language readily learnable by anyone that understands SQL. In this presentation, I will introduce Pig and show how it's been used at Twitter to solve numerous analytics challenges that became intractable with our former MySQL-based architecture.

Ratings and Recommendations

Avg. Rating

Average based
on 1 rating

comments powered by Disqus