Main Content

Building data-parallel pipelines in Erlang

About the Talk

March 29, 2012 7:25 AM

Marines’ Memorial Club & Hotel 609 Sutter Street San Francisco

Marines’ Memorial Club & Hotel 609 Sutter Street San Francisco

Data-parallel processing frameworks are being introduced at a fast pace and Erlang seems to be particularly well suited to soft real-time applications with high level of data parallelism and processing concurrency where reliability is important. With increased memory size and multi-core computing capabilities of modern processors and introduction of high-performance persistent storage, Erlang covers increasing portions of response time-data volume chart and complements very well existing large-scale analytics platforms like Hadoop.

This talk aims at presenting a case for building soft real-time, scalable data-parallel processing pipelines in Erlang.

We present architecture and simple specification language for building data-parallel flows in Erlang and share use cases covering data-parallel methods such as map-reduce and iterative graph algorithms to illustrate flexibility of the proposed approach. We discuss other important elements of the architecture such as capacity planning for typical use cases, relationship with other ecosystem components, instrumentation and monitoring, scheduling, replication and failover.

Talk objectives: Introduce the general area of data-parallel processing. Make case for using Erlang for data-parallel processing. Share proposed architecture for building data-parallel pipelines in Erlang. Share use cases and lessons learned; solicit feedback on proposed architecture

Target audience: System and infrastructure architects, large-scale data architects and scientists, CTOs, CIOs.

Ratings and Recommendations

This Talk hasn't been rated yet. Sign In to rate Talks.

comments powered by Disqus