Student Projects

I supervise student projects in my area of research at all levels: Part II, MPhil/Part III, and PhD. If you’re bright, motivated, and excited about the topics below, come join my research group!

I’m a new associate professor, having started only in 2024, but I have a fair bit of experience supervising student projects:

Part II: Nicole Choong, Simeon Stoykov, David Berry, Jack Wickham, Elias Calocane
MPhil/master’s: Liangrun Da, Ceren Kocaoğullar, Eeo Jun, Matthew Weidner
Research interns: Emilie Ma, Leo Stewen, Liangrun Da
PhD: Liangrun Da (incoming), Jessica Man (ongoing)

I’ve also examined 6 PhD theses and 4 MPhil dissertations so far, so I think I know what a good thesis looks like.

Research areas

The overarching theme of my research is decentralisation of Internet services: that is, trying to move beyond the dominance of a few big tech companies, and towards models where users have better control over their own data. Concretely, I’m working on:

Local-first, a new model for collaboration software (think real-time collaboration in the style of Google Docs) that works without centralised cloud services. I co-created Automerge, which powers a growing number of production collaboration tools, and I coined the term “local-first”, which is increasingly becoming an industry trend with a conference, podcast, newsletter, etc.
Decentralised social media, by advising social network Bluesky and contributing to the development of the underlying AT Protocol.
Decentralised and privacy-preserving approaches to carbon emissions accounting and reporting, as well as tracking other sustainability issues such as deforestation across supply chains. (This is a new research area, so I haven’t written much about it yet.)

My approach to research:

I connect theory with industrial practice. That is, I like to go all the way from a mathematical proof of an algorithm’s correctness to working with industrial collaborators to implement and deploy the algorithm in real-world systems.
I combine techniques from many areas of computer science: distributed systems, databases, formal verification, cryptography, and human-computer interaction.
I believe that doing good work requires a range of different perspectives. I therefore cultivate an inclusive environment, and collaborate with a diverse set of people, both inside and outside academia.
Encouragement and collaboration yield better results than pressure and competition.

As your supervisor, I will help you refine a project idea and provide guidance to make your project successful, but you’ll also have the freedom to choose your own path. I’ll try my best to answer your questions and unblock you if you’re stuck. I will help you get good at writing and presenting your ideas.

Project ideas

The following are some brief outlines of projects that I think could be promising. If you want to know more about one of them, please get in touch. I’m also happy for you to propose your own project, as long as it falls within my research area.

Decentralised access control. Level: Part II. Status: available

Imagine you have your phone and your laptop that are both authorised devices on your account; then your phone revokes your laptop’s access, and concurrently your laptop revokes your phone’s access. What should happen? In a centralised system, whichever revocation reaches the server first wins, but in a decentralised system that approach doesn’t work. In addition, we have to assume that some of the operations might be malicious (maybe your phone was stolen and the thief is now using it to access your account). Solving this problem robustly turns out to be a surprisingly challenging problem. We have a draft paper (available on request) that attempts a solution; this project is to implement and evaluate the algorithm from that paper.

CRDT-based version control. Level: Part II. Status: one project in progress

CRDTs like Automerge and Eg-walker allow multiple users to concurrently edit a text document, and to merge those edits automatically. They work by internally recording a keystroke-granularity editing history of the document. In principle, this enables them to offer version control features (like comparing versions of a file from different points in time), but in practice they do not yet implement this. The project is to implement a version control system based on one of these algorithms. It could efficiently support keystroke-granularity editing history, whereas Git would become very inefficient if you made a separate commit for every keystroke.

Benchmarking Operational Transformation. Level: Part II/MPhil. Status: one project in progress

In a recent paper we examined how different collaborative editing algorithms handle not only real-time collaboration, but also asynchronous collaboration, where users work on their own branch for a while before merging with their collaborators’ branches. We tested with one OT algorithm, which turned out to be extremely slow on asynchronous traces. This project is to explore this performance characteristic more thoroughly by testing a wider selection of algorithms (e.g. ShareDB) and datasets. You may have to implement some algorithms yourself, as some older ones are not available as peudocode, and collect additional datasets.

Byzantine fault tolerant set reconciliation. Level: Part II/MPhil. Status: available

Set reconciliation is a protocol in which two parties each have a set of items (e.g. files), and they want to figure out which items they need to send to each other so that at the end both have all of the items. Set reconciliation is very useful for data sync between collaborating users. A number of such algorithms have been published recently, including Rateless IBLTs, RBSR, BFT Set Reconciliation, minisketch (paper), hash history, and BEC. Many of them aim to be Byzantine fault-tolerant, i.e. resilient against corruption by malicious parties. The goal of this project is to benchmark these algorithms, and to compare their trade-offs in terms of performance, security, and other characteristics.

Formalising Git merge. Level: MPhil/PhD. Status: one project in progress

Even though huge numbers of people use Git every day, and many have a rough idea of how it works internally, there are some parts of it that very few people really understand. One such dark corner is how exactly Git performs merges, rebases, and cherry-picks. There are multiple merge strategies, but they are barely documented. Back in 2007 a formal model of 3-way merges was published, but I believe Git’s merge strategies are more sophisticated than what that model describes. This project is to develop a formal model that precisely describes how Git merges work, so that we can reason about them and prove properties (for example, does merging A into B always produce the same result as merging B into A?). As an extension, we can also consider less mainstream version control systems such as Pijul and Darcs.

Expressing CRDTs using Datalog. Level: MPhil/PhD. Status: one project in progress

CRDTs are an essential building block for local-first software, but they are difficult to get right. To be sure that they are correct, we have to prove their commutativity, which is a lot of work. One idea for solving this problem is to define CRDTs using Datalog, a subset of Prolog that was designed for expressing database queries. It would have the nice property that every algorithm written in the language would be guaranteed to converge! However, some CRDTs probably can’t be expressed using Datalog with stratified negation, and this project could explore where that boundary lies. Achieving good performance will require a Datalog engine that supports incremental view maintenance, of which there aren’t many, so this project may end up having to adapt an existing Datalog engine for the required workload.

App-specific conflict detection. Level: MPhil/PhD. Status: available

When multiple users concurrently edit the same file and then merge their changes, CRDTs guarantee that those users will automatically converge to the same state. However, that merged state might not be what they wanted; for example, in a graphics app, two users might have edited nearby objects in a way that the merged result is undesirable (e.g. the objects obscure each other). The problem is that the CRDT doesn’t know which objects are “near” each other, since that’s an application-defined notion, and so it can’t warn the user about the potentially bad merge. This project is to design and implement a mechanism for applications to query the editing history of a file in order to detect when concurrent edits may have a semantic conflict, in order to notify the user and involve them in manually resolving such conflict cases.

Cut&paste in collaborative text editors. Level: MPhil/PhD. Status: available

Most collaborative text editors support inserting and deleting text, but they don’t have explicit support for moving a block of text from one place to another. Cutting and pasting a section is done by deleting the text from the old location and inserting a fresh copy of the section in the new location. If another user concurrently edits that section, their edits are either lost or attached to the old location. It would be nicer if those edits were instead applied to the section in its new location. This project is to design, implement, and verify a collaborative text editing algorithm that supports moving blocks of text.

That’s just a selection of ideas… I have many more, but this page is getting too long!

For students outside of Cambridge

Please note that I advise students at other universities only in very exceptional cases. Sorry, there just aren’t enough hours in the day.

Student Projects

Research areas

Project ideas

For students outside of Cambridge

Subscribe

My book