Skip to content


Event sourcing and stream processing at scale

A talk at Domain Driven Design Europe, Brussels, Belgium, 29 Jan 2016

Abstract

If an idea is good, different communities will independently come up with it, but give it different names. For example, the ideas of Event Sourcing and CQRS emerged from the DDD community, while similar ideas appeared under the title of Stream Processing in internet companies such as LinkedIn, Twitter and Google.

This talk attempts to bridge those communities, and works out the commonalities and differences between Event Sourcing and Stream Processing, so that we can all learn from each other.

We will discuss lessons learnt from applying event-based architectures at large scale (over 10 million messages per second) at LinkedIn, and how such systems are implemented using the open source distributed messaging projects Apache Kafka and Apache Samza. We’ll also discuss some of the architectural choices that affect scalability (both scalability in terms of data throughput, as well as organisational scalability).

References

  1. Tyler Akidau, Robert Bradshaw, Craig Chambers, et al.: “The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing,” Proceedings of the VLDB Endowment, volume 8, number 12, pages 1792–1803, August 2015.
  2. Shirshanka Das, Chavdar Botev, Kapil Surlaker, et al.: “All Aboard the Databus!,” at ACM Symposium on Cloud Computing (SoCC), October 2012.
  3. Pat Helland: “Immutability Changes Everything,” at 7th Biennial Conference on Innovative Data Systems Research (CIDR), January 2015.
  4. Nathan Marz and James Warren: “Big Data: Principles and best practices of scalable realtime data systems.” Manning, April 2015, ISBN 9781617290343.
  5. Martin Kleppmann: “Designing data-intensive applications.” O’Reilly Media, to appear.
  6. Martin Kleppmann and Jay Kreps: “Kafka, Samza and the Unix philosophy of distributed data.” IEEE Data Engineering Bulletin, December 2015.
  7. Jay Kreps: “Why local state is a fundamental primitive in stream processing.” 31 July 2014.
  8. Jay Kreps: “Questioning the Lambda Architecture.” July 2014.
  9. Jay Kreps: “I ♥︎ Logs.” O’Reilly Media, September 2014.
  10. Praveen Neppalli Naga: “Real-time Analytics at Massive Scale with Pinot.” 29 Sept 2014.