Skip to content

2023 year in review

Published by Martin Kleppmann on 04 Jan 2024.

A lot has happened in the last year, so I thought it would be good to write up a review.

My biggest change in 2023 was that my wife and I had a baby! This has brought a mixture of joys and frustrations, but overall it has been very good. I took three months of full-time parental leave after the birth, and since going back to work I’ve been sharing the parenting with responsibilities with my partner. Family has therefore been my top priority, but I won’t talk much about family things in this post, since I prefer to keep it private. Lots of work things happened as well:

New job!

As of January 2024 I have a new job as Associate Professor in Cambridge! Unlike all my previous academic positions, which were all fixed-term contracts of a few years, this is a permanent position. A huge number of people apply for this sort of position, and so I feel very fortunate that my colleagues had faith in my work and decided to choose me.

(Technically, I have to pass a 5-year probation period until the position is permanent, but I’m told that this is mostly a formality, and nothing like the problematic tenure-track system in the US.)

I’ve arranged to work part-time (65%) for the first year on the job, so that I can do a greater share of the parenting duties until our child goes to nursery (which we’re hoping will be in approximately a year’s time). Partly for this reason I’ve not been given any teaching duties for this academic year. However, I’ve been asked to offer a new master’s module for next year, which will take some effort to prepare. I’m planning to do it on cryptographic protocols.

I had only started my previous job at TU Munich in October 2022, so it’s a bit strange to leave again after just over a year. However, Cambridge is better for us for family reasons, and Cambridge was offering a permanent position whereas my job at TU Munich was fixed-term, so it made sense to move back to Cambridge.

The biggest downside of moving is that I have lost the grant that brought me to Munich in the first place (since that grant requires me to be at a German university). That’s a shame, because it was a lot of money – enough for two PhD students and a postdoc for several years. One of my first activities in Cambridge will therefore be to start applying for new grants. Ç’est la vie (académique).

Research papers and projects

I had one big paper acceptance in 2023: our article “Pudding: Private User Discovery in Anonymity Networks” (with Ceren Kocaoğullar, Daniel Hugenroth, and Alastair Beresford) was accepted at the IEEE Symposium on Security and Privacy, which will take place in May 2024. This paper solves a problem with the Loopix/Nym anonymity network: previously you had to somehow find out someone’s public key in order to contact them on the network, and our work makes it possible to contact people via a short, friendly username instead (while preserving the security properties of the anonymity network).

Matthew Weidner and I went through several iterations of our paper “The Art of the Fugue: Minimizing Interleaving in Collaborative Text Editing”. The latest version is currently under submission at a journal, and a preprint is available on arxiv. This paper tackles a problem in many collaborative text editing algorithms: when different users insert text at the same place in a document (especially while working offline), the algorithms may mix up text from the different users. Our paper shows how to solve this problem.

The paper “Upwelling: Combining Real-time Collaboration with Version Control for Writers” (with Rae McKelvey, Scott Jenson, Eileen Wagner, and Blaine Cook) appeared on the Ink & Switch website in March. We also submitted it to an academic conference, but it was rejected, so we’re just keeping it as a web article. The paper describes a prototype rich text editor that combines Google-Docs-style real-time collaboration with Git-style version control features (branching, merging, diffing, and editing history).

My master’s student Liangrun Da published “Extending JSON CRDT with move operations”, a report from a research project he did with me in 2023. The goal of this project was to develop a move operation for Automerge, which could be used to reorder items in a list, or to move a subtree of a JSON document to a different location in the tree. The algorithm is not yet fully implemented within Automerge, but we’re hoping to get there this year.

My other master’s student Leo Stewen’s report “Undo and Redo Support for Replicated Registers” describes another algorithm prototype for Automerge – this one aiming to add support for undo and redo. This also turns out to not be entirely straightforward, especially when you consider the interaction with all the other features of Automerge.

Industrial collaborations: Automerge and Bluesky

I’ve continued my long-standing collaboration with Ink & Switch, in particular around the Automerge open-source project. Alex Good, who is funded by Automerge sponsors and my Patreon supporters, works full-time to maintain the project for our industrial users, while several others at Ink & Switch and in the open source community have been making valuable contributions. I’ve moved into an advisory role and haven’t been writing any actual code for the project lately.

The two biggest milestones for Automerge in 2023 were:

  • The release of Automerge 2.0, the rewrite of the original JavaScript code base in Rust. This has enabled huge performance improvements, and also made Automerge available on many more platforms: we compile Rust to WebAssembly and have a TypeScript/JavaScript wrapper for web browsers and node, but we can also compile Rust to a native library and call it from C, Go, Swift/iOS, Java/Android, and others. The idea is to implement the hairy, performance-critical CRDT logic once in Rust, and then to have wrapper APIs for all common programming languages that all share the same data format and interoperate.
  • Whereas Automerge itself is only an in-memory data structure library with no I/O, Automerge-Repo now provides out-of-the-box integrations with persistent storage (e.g. IndexedDB in a browser, and the filesystem in native apps) and with network protocols (e.g. WebSocket). Moreover, Automerge-Repo provides integrations with frontend libraries (e.g. React and Svelte). Previously app developers had to figure out all of this for themselves, so Automerge-Repo is a huge step forward in terms of making it easier to build applications on top of Automerge.

My other ongoing industrial collaboration is with Bluesky, a decentralised social network/protocol. Bluesky has had a tremendously successful year: launched into private beta in early 2023, it has grown to 3 million users by the end of the year. I’ve been advising the team since the beginning (they started development about two years ago) on topics around scalability, protocol design, architecture, and security.

I also helped them write a research paper about the Bluesky architecture and comparing it to other decentralised social protocols; we’ll be publishing that paper sometime in the next few months. I personally think Bluesky and the underlying AT Protocol do many things much better than the alternatives, such as Mastodon/ActivityPub, and they have a real chance of becoming a mainstream Twitter successor. Bluesky wants to come out of private beta and open up public federation early this year; it’s going to be an exciting time.

I still have some Bluesky invitation codes to give out. If I know you personally, feel free to send me an email and I’ll send you a code. (Sorry, I don’t have enough codes to give out to people I don’t know.)

Events, conferences, workshops

I co-organised three events last year:

  • The first summer school on Distributed and Replicated Environments (DARE) in Brussels, Belgium. We had 40 master’s and PhD students from all over Europe, and a few from further afield as well. I gave four hours of lectures (plus lots more time spent in informal conversations), and I think we succeeded in getting the students excited about research in distributed systems. One of the attending master’s students is now applying to do a PhD with me in Cambridge.
  • An unconference on local-first software in St. Louis, MO, USA the day after Strange Loop. We had space for about 100 people and the event sold out surprisingly quickly. Sadly I couldn’t be there because I caught covid at the summer school, but my co-organisers told me that there were excellent discussion among the attendees. Notes and photos from the event have been collected in this Git repository.
  • The Programming Local-First Software (PLF) workshop at SPLASH 2023 in Cascais, Portugal. This event aims to bring together industrial practitioners with researchers in the area of programming language design to discuss ways of improving how local-first software is developed. The event included a keynote by Brooklyn Zelenka, and we had 15 submissions from which we were able to build an interesting and varied programme of talks.

I also gave several public talks in 2023:

  • At the GOTO Amsterdam conference in June (recording) I gave a talk introducing Automerge and local-first software to an audience of industrial software engineers, and I repeated the talk at the Amsterdam Elixir meetup.
  • At the Strange Loop conference in September (recording) I spoke about the research we’ve done over the last few years on collaborative text editing, especially bringing together real-time collaboration with Git-style version control: diffing, branching, and merging (featuring Upwelling, Automerge, and Peritext). I had to give the talk remotely and I couldn’t see or hear the room, but I’m told that it was full, with standing room only.
  • At the KASTEL Distinguished Lecture Series in Karlsruhe, Germany (recording) I spoke about the security challenges that arise when you try making collaboration software peer-to-peer, and you have to make it work even though you don’t know who you can trust.
  • At the ACM Tech Talks series (recording) I gave a repeat of my GOTO Amsterdam talk, and there was a lively Q&A session afterwards with lots of good questions. There was a good turnout: around 400 people watched the talk live.
  • At the IETF Decentralization of the Internet Research Group I gave a talk about local-first software. My collaborators and I have been discussing that we would like to eventually develop open standards for the protocols around local-first software (right now it’s still too early, so this would be something to consider once they have matured a bit). I’m hoping that this talk might be the beginning of a process of engagement that could eventually lead to such a standardisation effort.

Designing Data-Intensive Applications

My book continues to sell well, with now over 230,000 copies sold, and reviews continue to be very positive. However, it is gradually showing its age – it was published in 2017, but I wrote the first few chapters around 2014/15, so they are now almost a decade old. Moreover, I have learnt a lot in the meantime, and there are quite a few things in the book that I would now say differently.

For that reason, I have been working on a second edition that brings the book up-to-date. However, my progress has been very slow, as I’ve had to fit in the research and writing for the second edition alongside my various other work and family commitments. I actually already agreed to do the second edition with O’Reilly in 2021, and the full manuscript was supposed to be complete by January 2023. Well… that didn’t quite happen as planned.

In fact, I only properly started writing in 2023, and so far I’ve only completed the revision of the first three chapters. I’m much happier with the revised version, but it takes a lot of time to do such thorough revisions, so I’m not even going to try to give an updated completion date. I’d much rather take the time to make it good, however long it takes, rather than rush to meet some artificial deadline. And I’m in the lucky situation where I can get away with such a stance.

In case you’re wondering what’s changing in the second edition: I’m keeping the high-level structure and topics quite similar, but I’m rewriting a lot of the actual text to be easier to follow and more nuanced. I also collected a lot of reference material over the years (books, papers, blog posts, etc.); a large part of my time is spent reading that material and incorporating it into the narrative.

The biggest technological change since the first edition is probably that hosted cloud services are now a much bigger thing than they were a decade ago, and the resulting rise of “cloud-native” architecture. Other things: NoSQL as a buzzword is dead (though many of its ideas have been absorbed into mainstream systems), MapReduce is dead (replaced by cloud data warehouses, data lakes, and things like Spark), and GDPR arrived (though the degree to which it is influencing data systems architecture is still somewhat open).

Local-first is taking off

Together with some colleagues from Ink & Switch I coined the term “local-first” in 2019 to describe the type of software we wanted to enable with Automerge and related projects. Initially the term was mostly used by ourselves and our direct collaborators, but in 2023 we have seen the idea catching on much more widely:

It’s exciting that so many people are buying into the idea. Over the coming years I hope we will continue to grow this community, and realise the advantages of the local-first approach in a broader range of software.