My book
My book, Designing Data-Intensive Applications, has received thousands of five-star reviews.
I am a researcher working on local-first software and security protocols at TU Munich. If you find my work useful, please support me on Patreon.
Martin Kleppmann
In: Encyclopedia of Big Data Technologies, Springer, March 2018.
Chapter on Apache Samza for the Encyclopedia of Big Data Technologies, ISBN 978-3-319-63962-8.
Apache Samza is an open source framework for distributed processing of high-volume event streams. Its primary design goal is to support high throughput for a wide range of processing patterns, while providing operational robustness at the massive scale required by Internet companies. Samza achieves this goal through a small number of carefully designed abstractions: partitioned logs for messaging, fault-tolerant local state, and cluster-based task scheduling.