Prediction: AI will make formal verification go mainstream
Published by Martin Kleppmann on 08 Dec 2025.
Much has been said about the effects that AI will have on software development, but there is an
angle I haven’t seen talked about: I believe that AI will bring formal verification, which for
decades has been a bit of a fringe pursuit, into the software engineering mainstream.
Proof assistants and proof-oriented programming languages such as Rocq,
Isabelle, Lean,
F*, and Agda have been around for a long
time. They make it possible to write a formal specification that some piece of code is supposed to
satisfy, and then mathematically prove that the code always satisfies that spec (even on weird
edge cases that you didn’t think of testing). These tools have been used to develop some large
formally verified software systems, such as an operating system kernel,
a C compiler, and a
cryptographic protocol stack.
At present, formal verification is mostly used by research projects, and it is
uncommon for industrial software
engineers to use formal methods (even those working on classic high-assurance software such as
medical devices and aircraft). The reason is that writing those proofs is both very difficult
(requiring PhD-level training) and very laborious.
For example, as of 2009, the formally verified seL4 microkernel consisted of 8,700 lines of C code,
but proving it correct required 20 person-years and
200,000 lines of Isabelle
code – or 23 lines of proof and half a person-day for every single line of implementation. Moreover,
there are maybe a few hundred people in the world (wild guess) who know how to write such proofs,
since it requires a lot of arcane knowledge about the proof system.
To put it in simple economic terms: for most systems, the expected cost of bugs is lower than the
expected cost of using the proof techniques that would eliminate those bugs. Part of the reason is
perhaps that bugs are a negative externality: it’s not the software developer who bears the cost of
the bugs, but the users. But even if the software developer were to bear the cost, formal
verification is simply very hard and expensive.
At least, that was the case until recently. Now, LLM-based coding assistants are getting pretty good
not only at writing implementation code, but also at
writing
proof scripts in
various languages. At present, a human with specialist
expertise still has to guide the process, but it’s not hard to extrapolate and imagine that process
becoming fully automated in the next few years. And when that happens, it will totally change the
economics of formal verification.
If formal verification becomes vastly cheaper, then we can afford to verify much more software. But
on top of that, AI also creates a need to formally verify more software: rather than having humans
review AI-generated code, I’d much rather have the AI prove to me that the code it has generated is
correct. If it can do that, I’ll take AI-generated code over handcrafted code (with all its
artisanal bugs) any day!
In fact, I would argue that writing proof scripts is one of the best applications for LLMs. It
doesn’t matter if they hallucinate nonsense, because the proof checker will reject any invalid proof
and force the AI agent to retry. The proof checker is a small amount of code that is itself
verified, making it virtually impossible to sneak an invalid proof past the checker.
That doesn’t mean software will suddenly be bug-free. As the verification process itself becomes
automated, the challenge will move to correctly defining the specification: that is, how do you know
that the properties that were proved are actually the properties that you cared about? Reading and
writing such formal specifications still requires expertise and careful thought. But writing the
spec is vastly easier and quicker than writing the proof by hand, so this is progress.
I could also imagine AI agents helping with the process of writing the specifications, translating
between formal language and natural language. Here there is the potential for subtleties to be lost
in translation, but this seems like a manageable risk.
I find it exciting to think that we could just specify in a high-level, declarative way the
properties that we want some piece of code to have, and then to vibe code the implementation along
with a proof that it satisfies the specification. That would totally change the nature of software
development: we wouldn’t even need to bother looking at the AI-generated code any more, just like we
don’t bother looking at the machine code generated by a compiler.
In summary: 1. formal verification is about to become vastly cheaper; 2. AI-generated code needs
formal verification so that we can skip human review and still be sure that it works; 3. the
precision of formal verification counteracts the imprecise and probabilistic nature of LLMs. These
three things taken together mean formal verification is likely to go mainstream in the foreseeable
future. I suspect that soon the limiting factor will not be the technology, but the culture change
required for people to realise that formal methods have become viable in practice.
If you found this post useful, please
support me on Patreon
so that I can write more like it!
To get notified when I write something new,
follow me on Bluesky or
Mastodon,
or enter your email address:
I won't give your address to anyone else, won't send you any spam, and you can unsubscribe at any time.