How does Twitter analyze its massive dataset? What tools do we use, and where do we focus our analysis? In this talk, I will discuss our transition from a MySQL-based to a Hadoop-based data infrastructure and our use of Pig (a scripting language built on top of Hadoop) to democratize big-da…
Most startups begin with a basic LAMP stack (on PHP or Python) and then add database replication and memcache as they grow. But then what? There's a big gap between these out-of-the-box solutions and what it takes to run something bigger.
Massive growth in the size of business datasets leads many companies to Hadoop, an emerging architecture for parallel data processing. However, the migration path can be challenging, in part because MapReduce analyses use programming languages like Java and Python rather than SQL. Apache Pig…