Scalable stream processing with Kafka and Samza
A talk at
Unified Log London,
London, UK, 02 Dec 2014
This was a repeat of my talk at ApacheCon Europe.
Abstract
Samza, an Apache Incubator project, is a framework for
processing and analysing high-volume data streams. It is built upon
Apache Kafka and
YARN (Hadoop 2.0).
You can think of Samza as a real-time, continuously running version of MapReduce.
In this talk, Martin will show why stream processing is becoming an important part of the
architecture of data-intensive applications, alongside storage and batch processing. We will explore
how Samza works, and show how it reliably processes millions of messages per second. We will also
examine what kinds of applications would benefit from using Samza.