The Thing We All Obviously Want

Generated by AI—notice the perspective.

Over the past year, we have seen the rapid development of AI-assisted programming to an astounding degree. Even five years ago, fully-automated program synthesis of large-scale, production systems would have seemed unthinkable. Today, this is not an ambition, it is a reality, at least by some measure. To some computer scientists, natural-language-driven program synthesis was the endgame. On the other hand, the software I use day-to-day doesn’t seem to be getting appreciably better overall. Systems are still broken, apps unresponsive (even on well-resourced hardware), crashes are still common, and interfaces are generally as clunky as before. Personally, I believe we will eventually see many systems adapted by AI-assisted refactoring tools; but I also recognize there are human barriers to deploying those things at full thrust in the short (even medium) term.

In any case, my position is that AI-assisted programming, giving us real-time, on-demand generation of any app, is the thing that we all obviously want. There are a few tensions with this reality: (a) it seriously changes the value proposition of what “code” is in a meaningful way, (b) there are externalities: wasted computation, energy replicating junk, and (c) it challenges the role of humans in the knowledge-generation process.

Note: In the rest of this essay I will use the term “LLM(s).” In general, when I say this, I mean a state-of-the-art integration of a frontier model alongside relatively simple tools (e.g., Claude Code, Codex, etc.). There is some nuance in building these tools, but given that the innovation is the model, I will casually refer to the whole agentic process as the “LLM.”

Program Synthesis: Did it Fail?

Traditional program synthesis (by which I generally mean, SMT-based, search-based, or similar) leveraged a rigorous and formal enumeration / proof to produce a synthesized program–potentially with a certificate of its correctness–driven via a rigorous specification. Like many academic fields, the goal of program synthesis was not only to effect fully-automated programming tools: it was to advance the frontier of understanding in semantics, verification, specification, etc. These were the challenging problems, especially given that traditional search (on the CPU) was so slow.

LLMs allow rigorous concepts to gracefully degrade by using text. The underlying model has such a deep understanding of language that fuzzy, hazily-posed descriptions often still give some sensible interpretation. The obvious issue is hallucination: when you push the embedding space into some inconsistency, won’t it just generate junk? And of course, this is absolutely an issue–but when the error rate is low enough that it’s practically useful, many people will not care.

My position is that LLM-guided software engineering was so wildly successful not just because it nailed the generation part, but also because LLMs ended up practically solving the problem of specification. Humans are simply used to the failure modes of underspecification: even from a young age we’re trained to expect disappointment if we miscommunicate our expectations, and so having the LLM fail doesn’t sting as badly as you might expect.

Granularly-Evolving Formal Specs

One potential issue I foresee with current-generation AI is that they focus the process on a textual-only workflow. In practice, smart humans do want to read something that looks like code most of the time–the issue is that they want to be able to focus their limited mental attention rather than sifting through thousands of lines of code. Most anybody who ever worked on a large codebase (that they did not write entirely themselves) never had more than an LLM-level understanding of parts of the codebase anyway. Instead, we embarked upon code understanding efforts whenever we faced tricky bugs, needed to add new features, etc. We codified this in our own mental model (memory, notes, etc.), but also (sometimes) documentation, bug reports, etc. Hilariously, this is now the kind of thing that the LLM loves to ingest.

As we build software, we want to be able to start with a hazy specification (probably in English, but maybe in a big document) and be able to begin building an application. At key decision-making points we want to be able to solicit input and, finally, be dropped into an exploratory state where we may make our thoughts more granularly precise. For many reasons, I still believe that this should be formal, executable code, not English prose.

The issue is that no single language is perfect. English is great for laypeople: any arbitrarily complex topic can be compressed into an arbitrarily-simple soundbite. Unfortunately, English is imprecise and even lies to you via the embedding. On the other end of the spectrum, we might have Lean in a loop with the LLM. The LLM is speaking Lean and there is some amount of grammar- and...

The Thing We All Obviously Want

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars