A simple application may start out with one database, but as you scale and add features, it usually
turns into a tangled mess of datastores, replicas, caches, search indexes, analytics systems and
message queues. When new data is written, how do you make sure it ends up in all the right places?
If something goes wrong, how do you recover?
Change Data Capture (CDC) is an old idea: let the application subscribe to a stream of everything
that is written to a database – a feed of data changes. You can use that feed to update search
indexes, invalidate caches, create snapshots, generate recommendations, copy data into another
database, and so on. For example, LinkedIn’s Databus and
Facebook’s Wormhole
do this. But the idea is not as widely known as it should be.
In this talk, I will explain why change data capture is so useful, and how it prevents race
conditions and other ugly problems. Then I’ll go into the practical details of implementing CDC
with PostgreSQL and
Apache Kafka, and discuss the approaches you can use to do the same with
various other databases.