Agentic coding and mental modelsai<br>Agentic coding and mental models<br>11 June 2026<br>I reckon I’ve drafted and then deleted a version of this post<br>at least 10 times in the last 12 months.<br>Deleted because it falls in the category<br>“I must be wrong about this as everyone else is saying the opposite”.<br>But this week’s release of Fable,<br>and especially the reasons people are saying it’s such an improvement,<br>are the nudge I needed to finally publish.<br>So, here goes:<br>I think everyone is wrong<br>about how to write code with LLMs.<br>Or at least,<br>I think they’re wrong<br>about how I should write code with LLMs.<br>The reason is to do with mental models.<br>When you write code,<br>the code is not the only artifact that’s generated.<br>You also construct a mental model<br>of how the code works,<br>its runtime behaviour under different conditions,<br>how it fails and so on.<br>Outside of toy projects,<br>this mental model is rarely perfect.<br>Improvement and maintenance of the mental model<br>continues for at least as long as<br>improvement and maintenance of the code itself.<br>Mental models also have an alarming tendency that code doesn’t;<br>they degrade rapidly whenever you’re not thinking about them<br>and the cost of reconstructing them increases<br>the longer you continue not to think about them.<br>They’re wriggly little buggers,<br>mental models.<br>Now a few times in my career,<br>I’ve been fortunate enough<br>to work with bona fide programming geniuses.<br>They come in various forms<br>but a common thread is their ability<br>to construct mental models rapidly and accurately,<br>then apply them successfully on a complex project.<br>Unfortunately, I’m not one of these geniuses.<br>I’m a regular engineer,<br>somewhere in the middle<br>of the bell curve of programming ability.<br>So for me,<br>every mental model is the result of hard struggle.<br>Reading lines of code,<br>observing behaviour,<br>printing runtime state,<br>using a debugger,<br>occasionally resorting to trial-and-error changes just to see what happens.<br>In this way, gradually,<br>I inch myself closer to sufficient understanding<br>that I can make changes without breaking stuff<br>most of the time<br>(and yet, still stuff breaks 🤔).<br>I value my mental models highly.<br>To me they hold greater value<br>than the code itself,<br>which might sound crazy to some people I guess.<br>A mental model is a delicate flower<br>that must be cultivated with care<br>and protected against trampling.<br>Oh, look!<br>Here come the LLMs<br>to trample all over my mental model.<br>At this point I should make clear,<br>I’m not anti-LLM per se.<br>I use them as my daily driver at work<br>and on numerous side projects in my spare time.<br>Claude even fixed a 5-year old memory leak<br>that I’m not ashamed to admit I was unable to fix on my own.<br>So I’m sold on the technology.<br>But I’m not sold on how I’m being told to use it.<br>Everything I read encourages me<br>to turn the automation dial up to 11.<br>If I don’t let agents work autonomously,<br>I won’t get their full benefit.<br>Actually I should have swarms of agents<br>running in parallel,<br>helping me to ship more features at once.<br>I hear at some workplaces there are usage leaderboards<br>and the assumption seems to be<br>that token spend directly correlates with business value.<br>I find this reasoning insane.<br>Let’s detour briefly to talk about code reviews.<br>It’s quite well understood, I think,<br>that big code reviews are harder than small code reviews.<br>As responsible engineers,<br>we try to separate large changesets into smaller independent units,<br>to help our colleagues review them better<br>and so we receive better feedback as a consequence.<br>Give me a 200-line code review in a familiar codebase<br>and I’ll feel confident about understanding its impact.<br>At that size,<br>it’s easy to update my mental model<br>and assimilate whatever the change does<br>(or what it tries to do).<br>Then I might see some tradeoffs in the approach;<br>there could be performance concerns,<br>lurking footguns<br>or perhaps an existing abstraction could be re-used<br>to make the change more cohesive.<br>Make it a 1k-line code review instead<br>and things take a little longer,<br>but the size is still within reason.<br>Increase it to 2k lines<br>and I’m setting aside a solid block of time<br>and giving it multiple passes.<br>At 5k lines we’ve probably exceeded my capacity<br>to meaningfully review the change<br>unless it’s broken into smaller chunks.<br>You see the pattern.<br>Increasing autonomy for coding agents has precisely the same effect<br>as increasing the size of code reviews.<br>It makes my job harder,<br>slows down development of my mental model<br>and decreases confidence that I understand what’s going on.<br>Automation evangelists might tell me to let go at this point;<br>automate code reviews,<br>automate verification,<br>automate bug fixes,<br>let the machines do all the work.<br>I’m sure this works for some people,<br>geniuses to the right of the bell curve perhaps,<br>but it doesn’t work for me<br>because I’m left without a working mental model.<br>At that...