The Silent Critic - EXECUTIVE ORC HOUSE
Like most folks, I’ve been using The Models 1 to write code now for the better part of a year. My process has changed over the course of the last few months, partly because the models are getting better at executing on the tasks I set them, but mostly because the gap between what the models enable and the systems for controlling context we all live in is growing, and growing at an ever-increasing rate.
If you read my earlier musings, you get a sense of my theory about what’s happening. Our intuitions and practices put together over the last couple of decades, when code was expensive relative to human attention, no longer fit the bill.
So I’ve changed my code review habits; I use the models on the review artifacts, not to replace my attention, but to focus it. But that’s very ad hoc, and it’s subject to a lot of noise – the models often see things that are real, but not issues; or miss design changes that are underdocumented by the author. This isn’t a condemnation of my fellow humans, or of the models; but it’s pretty clear to me that the systems we have inherited guide us into habits and constructions that work at cross purposes to the ultimately liberatory possibility of a natural-language-driven interface to building software.
The true enemy, so to speak, is the whole process of software, and the economics of the software industry, but that’s a bit out of scope for one guy with a Claude subscription. The near enemy is the tooling. Happily, I have spent the last 35 years writing tools, so I roll up my sleeves, get down to it, and have built a … thing. It’s not a harness (quite); it’s not a reviewer (really); it’s a thing. I call it “The Silent Critic”.
I’m a huge fan of the author Jack Vance; I think he’s the greatest stylist in English letters in the latter half of the 20th century, and while he was a creature of his time, and some of his politics haven’t aged particularly well, I love his books, and think about his universes a lot. His worlds are strange; his characters complex (for pulp SF, but in general), and he can tell a rollicking good yarn; but it’s his prose voice – syrupy, thick with meaning that hovers just outside the boundaries of familiarity, wry and cutting – that really sets him apart.
In particular, he has a tetralogy called Planet of Adventure , about a huge, ancient planet that hosts several alien species, as well as humans, surprisingly. In the fourth book, The Pnume , we meet the old, hidden masters of the planet Tschai; an insectile alien species called the Pnume, who have enslaved humans (known as the “Pnumekin”), from time nearly immemorial, and have co-evolved an underground society where quietude and good behaviour rule; this equanimity is maintained not by threats of violence, but internally, by the Pnume and Pnumekin both, from their overmastering sense of propriety. In the course of this picaresque, we encounter two ominous Pnume figures, named The Warden and The Silent Critic; and he sees that the propriety is internalized, yes; Pnumekin society is calm, yes; but there are also these fearsome figures who cow not merely the Pnumekin, but other Pnume.
What does this have to do with agentic coding? Bear with me. What I’ve noticed is that underspecification, which is part and parcel of natural language interfaces, leads the agents to, as Zap 210 would say, “boisterous conduct”. What does this mean? Well, for one, it’s context escape – the models assume things about their working context, because that’s underspecified, and they introduce context that they can find, from the environment, from the shell, from the filesystem, sometimes seemingly from the aether. For another, it’s gaming the system; you tell them to do something, and they’ll do it. They are, perhaps unsurprisingly!, extremely literal-minded. They will absolutely game requirements if that enables them to perform the task the operator sets them. It’s hard to blame them 2 but it demands a different kind of vigilance than we’ve been used to.
What do I mean? Well, we’re used to our tools being deterministic, if limited. If we compile something, it stays compiled. If we run a tool, it picks up the context we made available to it, and then it stops. This isn’t to say that context escape and hyper-eager search for loopholes are not part of our daily experience as programmers; all of us have, you know, left an environment variable defined that causes unexpected behaviour; but those behaviours are, to some reasonable degree, pace solar wind, deterministic, and we have developed scar tissue and techniques as a consequence.
The models make a mockery of those techniques; they trick us. They do their own thing, with only the most formal, legalistic relationship to our VERY REASONABLE REQUEST....