Patterns for real-time stream processing
A talk at
Crunch Conference,
Budapest, Hungary, 30 Oct 2015
Abstract
You have some streams of data, such as user activity on a website, or sensor readings from devices.
Now you want to process the data and make it useful with low latency: for example, generating
real-time recommendations, detecting abuse, filtering spam or predicting demand. And you want it to
scale well.
Perhaps you’ve heard of distributed stream processing frameworks such as Samza, Storm or Spark
Streaming, which may do what you want, but you’re not sure how to use them most effectively.
This talk will introduce some common design patterns for working with high-volume, real-time data
streams. We will look at things like joining, enriching, filtering and aggregating streaming data,
and we’ll explore how you might break down an application into streaming operators that do what you
want.
References
- Apache Samza documentation.
- Tyler Akidau, Robert Bradshaw, Craig Chambers, et al.:
“The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale,
Unbounded, Out-of-Order Data Processing,”
Proceedings of the VLDB Endowment, volume 8, number 12, pages 1792–1803, August 2015.
- Shirshanka Das, Chavdar Botev, Kapil Surlaker, et al.:
“All Aboard the Databus!,” at
ACM Symposium on Cloud Computing (SoCC), October 2012.
- Nathan Marz and James Warren: “Big Data: Principles and best practices of scalable
realtime data systems.” Manning, April 2015, ISBN 9781617290343.
- Martin Kleppmann: “Designing data-intensive applications.”
O’Reilly Media, to appear.
- Martin Kleppmann: “Moving faster with data streams: The rise of Samza at
LinkedIn.”
14 July 2014.
- Jay Kreps: “Why local state is a fundamental primitive in stream
processing.”
31 July 2014.
- Jay Kreps: “I ♥︎ Logs.” O’Reilly Media, September 2014.
- Praveen Neppalli Naga: “Real-time Analytics at Massive Scale with
Pinot.” 29 Sept 2014.
- Lili Wu, Sam Shah, Sean Choi, Mitul Tiwari, and Christian Posse: “The Browsemaps: Collaborative Filtering
at LinkedIn,”
at 6th Workshop on Recommender Systems and the Social Web, Oct 2014.