Using logs to build a solid data infrastructure
A talk at
Craft Conf,
Budapest, Hungary, 24 Apr 2015
Abstract
How does your database store data on disk reliably? It uses a log.
How does one database replica synchronise with another replica? It uses a log.
How does a distributed algorithm like Raft achieve consensus?
It uses a log.
How does activity data get recorded in a system like Apache Kafka? It uses a log.
How will the data infrastructure of your application remain robust at scale? Guess what…
Logs are everywhere. I’m not talking about plain-text log files (such as syslog or log4j) – I mean
an append-only, totally ordered sequence of records. It’s a very simple structure, but it’s also
a bit strange at first if you’re used to normal databases. However, once you learn to think in terms
of logs, many problems of making large-scale data systems reliable, scalable and maintainable
suddenly become much more tractable.
Drawing from the experience of building scalable systems at LinkedIn and other startups, this talk
will explore why logs are such a fine idea: making it easier to maintain search indexes and caches,
making your applications more scalable and more robust in the face of failures, and opening up your
data for richer analysis, while avoiding race conditions, inconsistencies and other ugly problems.