On the success of 'natural language programming' - Marc's Blog
Marc's Blog
About Me
My name is Marc Brooker. I've been writing code, reading code, and living vicariously through computers for as long as I can remember. I like to build things that work. I also dabble in machining, welding, cooking and skiing.
I'm currently an engineer at Amazon Web Services (AWS) in Seattle, where I work on databases, serverless, and serverless databases. Before that, I worked on EC2 and EBS.
All opinions are my own.<br>Links
My Publications and Videos
@marcbrooker on Mastodon<br>@MarcJBrooker on Twitter
On the success of ‘natural language programming’
Specifications, in plain speech.
I believe that specification is the future of programming.
Over the last four decades, we’ve seen the practice of building programs, and software systems grow closer and closer to the practice of specification. Details of the implementation, from layout in memory and disk, to layout in entire data centers, to algorithm and data structure choice, have become more and more abstract. Most application builders aren’t writing frameworks, framework builders aren’t building databases, database builders aren’t designing protocols, protocol designers aren’t writing kernels, and so on. Our modern software world is built on abstractions.
Significant advancements are made, from time to time, by cutting through these abstractions. But still, the abstractions dominate, and will continue to.
The practice of programming has become closer and closer to the practice of specification. Of crisply writing down what we want programs to do, and what makes them right. The how is less important.
I believe that natural language will form the core of the programming languages of the future.
The Ambiguity Problem
The most common objection to this view is that natural language is ambiguous. It’s exact meaning is potentially unclear, and highly dependent on context. This is a real problem.
For example, in The Bug in Paxos Made Simple, I look at a common bug in implementations of Paxos caused directly by the ambiguity of natural language.
Pointing out this ambiguity isn’t criticizing [Lamport’s] writing, but rather reminding you about how hard it is to write crisp descriptions of even relatively simple distributed protocols in text.
As Lamport says:
Prose is not the way to precisely describe algorithms.
Perhaps the most famous statement of this problem is Dijkstra’s from On the foolishness of “natural language programming”:
When all is said and told, the “naturalness” with which we use our native tongues boils down to the ease with which we can use them for making statements the nonsense of which is not obvious.
Dijkstra’s argument goes beyond merely pointing out ambiguity, and the lack of precision of natural language, but also points out the power of symbolic tools. All of these arguments are true. Reasoning using the symbolic and formal tools of mathematics is indeed powerful. It is tempting to poke holes in this argument by pointing out that most programs don’t need precise specification, and that there’s a large opportunity for natural language to specify those programs. This argument is true, but doesn’t go far enough.
Instead, I argue that ambiguity doesn’t doom natural language programming for one simple reason: almost all programs are already specified in natural language. And always have been.
Where Do Programs Come From?
Programs come from requirements from people, and people specify the need for those programs using that least precise of tools: natural language. We talk to customers, to stake holders, to product managers, and other consumers of our programs and ask them what they want. Sometimes, we’ll get a precise specification, like an OpenAPI spec or an RFC. More often, we’ll get something fuzzy, incomplete, and ambiguous.
That tends to work for two reasons. First, we’ll apply context. Common sense. Our understanding from similar successful projects about the requirements that users have in common. Or maybe even formal compliance requirements. Second, we’ll have a conversation. Hey, I didn’t quite understand your requirement for the daily average, do you want the mean or median? Or can you make it so I don’t lose data even if a machine fails? Software teams and organizations have these conversations continuously.
This is how software is built professionally, how software construction is taught, and how open source and even hobby communities build their systems.
Sometimes, these conversations will become formal. A snippet of code. A SQL query. An example. But most often they’re informal. A conversation. A napkin sketch. Some hand-waving over lunch.
LLMs allow us to include our computers in these conversations.
Specifications are Loops
Vibe coding is the ultimate embodiment of this: building a specification for a program based on a conversation of yes, and and no, but. A closed loop of a developer and an AI model or agent having a...