Skip to content


Scalable real-time data processing with Apache Samza

A talk at Jfokus, Stockholm, Sweden, 04 Feb 2015

Abstract

High-volume event streams are becoming widespread: IoT sensor data, activity events on social media, and monitoring events for fraud detection, to mention just a few. Hadoop is great for analysing data after the fact, but it’s often too slow to respond to things happening right now. Traditional event processing frameworks are not scalable enough to handle the onslaught of data.

Apache Samza and Apache Kafka, two open source projects that originated at LinkedIn, have set out to solve this problem. They are designed to go together: Kafka is a fault-tolerant message broker, and Samza provides a scalable and powerful processing model on top of it.

This talk will introduce those projects, and explain how you can use them to solve your real-time big data problems.