Apache Samza
Martin Kleppmann
In: Encyclopedia of Big Data Technologies, Springer,
March 2018.
Chapter on Apache Samza for the
Encyclopedia of Big Data Technologies,
ISBN 978-3-319-63962-8.
Abstract
Apache Samza is an open source framework for distributed processing of high-volume event streams.
Its primary design goal is to support high throughput for a wide range of processing patterns, while
providing operational robustness at the massive scale required by Internet companies. Samza
achieves this goal through a small number of carefully designed abstractions: partitioned logs for
messaging, fault-tolerant local state, and cluster-based task scheduling.