Scalable stream processing with Apache Samza and Apache Kafka
A talk at
ApacheCon Europe,
Budapest, Hungary, 18 Nov 2014
Abstract
Samza, an Apache Incubator project, is a framework for
processing and analysing high-volume data streams. It is built upon
Apache Kafka and
YARN (Hadoop 2.0).
You can think of Samza as a real-time, continuously running version of MapReduce.
In this talk, Martin will show why stream processing is becoming an important part of the
architecture of data-intensive applications, alongside storage and batch processing. We will explore
how Samza works, and show how it reliably processes millions of messages per second. We will also
examine what kinds of applications would benefit from using Samza.
This talk is for anyone interested in large-scale data processing problems. Developers working with
Hadoop, distributed storage (e.g. HBase, Cassandra) or real-time data flows will find it
particularly interesting. You will learn:
- What kinds of real-time data problems you can solve with Samza;
- How the stream processing model helps developers write more reliable applications more easily;
- Apache Samza’s approach to stream processing, and how it compares to other frameworks;
- How to contribute to development.