Skip to content


Upcoming conference talks about Samza

Published by Martin Kleppmann on 28 Aug 2014.

After my talk about Samza fault tolerance at Berlin Buzzwords was well received a few months ago, I submitted several more talk proposals to a variety of conferences. To my surprise, all the proposals were accepted, so I’m now going to have a fairly busy time in the next few months!

Here are the four conferences at which I’ll be speaking between September and November. All the talks are about Apache Samza, the stream processing project I’ve been working on. However, all the talks are different, each focussing on a different aspect and perspective.

If you don’t yet have a ticket for these conferences, there are a few discount codes below. Hope to see you there :-)

Turning the database inside out with Apache Samza
Strange Loop, September 18–19 in St. Louis, Missouri. (Lanyrd, Twitter)

The Strange Loop conference explores the future of software development from a wonderfully eclectic range of viewpoints, ranging from functional programming to distributed systems. In this talk I’ll discuss the potential of stream processing as a fundamental programming model, which has big advantages compared to the way we usually build applications today.

Building real-time data products at LinkedIn with Apache Samza
Strata + Hadoop World, October 15–17 in New York. (Lanyrd, Twitter)
Use discount code SPEAKER20 to get 20% off.

MapReduce and its cousins are powerful tools for building data products such as recommendation engines, detecting anomalies and improving relevance. However, with batch processing there may be several hours delay before new data is reflected in the output. With stream processing, you can potentially respond in seconds rather than hours, but you have to learn a whole new way of thinking in order to write your jobs. In this talk I’ll discuss some real-life examples of stream processing at LinkedIn, and show how to use Samza to solve real-time data problems.

Staying agile in the face of the data deluge
Span conference, October 28 in London, UK. (Lanyrd, Twitter)
Use this link to get a 20% discount.

An often-overlooked but important aspect of tools is their plasticity: if your application’s requirements change, how easily do the tools let you adapt your existing code and data to the new requirements? Samza is designed with plasticity in mind. In this talk I’ll discuss how re-processing of data streams can keep your application development agile.

Scalable stream processing with Apache Samza and Apache Kafka
ApacheCon Europe, November 17–21 in Budapest, Hungary. (Lanyrd, Twitter)

Many of the most important open source data infrastructure tools are projects of the Apache Software Foundatation: Hadoop, Zookeeper, Storm and Spark, to name just a few. In this talk I’ll focus on how Samza and Kafka (also Apache projects) fit into this lively open source ecosystem.

Background reading

If you don’t yet know about Samza, don’t worry: I’ll start each talk with a quick introduction to Samza, and not assume any prior knowledge.

But if you want to ask smart-ass questions and embarrass me in front of the audience, you can begin by reading the Samza documentation (thoroughly updated over the last few months by yours truly), and start thinking of particularly tricky questions to ask.

You may also be interested in this excellent series of articles by Jay Kreps, which are relevant to the upcoming talks: