Skip to content


Apache Samza

Martin Kleppmann

In: Encyclopedia of Big Data Technologies, Springer, March 2018.

Chapter on Apache Samza for the Encyclopedia of Big Data Technologies, ISBN 978-3-319-63962-8.

Abstract

Apache Samza is an open source framework for distributed processing of high-volume event streams. Its primary design goal is to support high throughput for a wide range of processing patterns, while providing operational robustness at the massive scale required by Internet companies. Samza achieves this goal through a small number of carefully designed abstractions: partitioned logs for messaging, fault-tolerant local state, and cluster-based task scheduling.