Making budget models punch above their weight with a smart Rust harness

(iterate think thoughts): Making budget models punch above their weight with a smart Rust harness

(iterate

think

thoughts)

Theme

June 8, 2026

Making budget models punch above their weight with a smart Rust harness

Dirge is an agentic harness that I've been developing for my own use, and it's getting to the point where it's becoming generally useful. In this post, I'll discuss some of the rationale behind it and the interesting features it provides which differentiate it from other tools in this space. The first thing to note are its performance and memory footprint. Most existing coding tools like OpenCode are rather memory-intensive, often using around 300 MB of RAM even when sitting there doing nothing. Some tools like Claude Code even lag at times while you're typing. Dirge is written in Rust, which compiles to a tiny and fast binary file weighing in at about 30 MB. When Dirge starts up, it needs only around 8 MB of RAM while idle, while working on tasks pushes that up to roughly 15 MB. So you could run twenty copies of Dirge at the same time for the cost of a single instance of OpenCode. However, lean size alone is not the main point, and there are other Rust-based harnesses to choose from. What makes Dirge actually interesting is how it supports less capable models to get the most out of them. The conventional wisdom is that intelligence resides in the model, and the harness is treated as a matter of minimal plumbing. Its sole job is to give the model a tool loop along with a system prompt, and then to stay out of the way. In this view, the only way of getting a better agent is to get a bigger, and typically more expensive, model. Having spent a lot of time doing agentic coding with models such as DeepSeek and Qwen has changed my mind on the subject. It turns out that much of what makes an agent effective in practice lies in how well the harness meets the expectations of the model. The model usually knows what it wants, and can figure out how to get there and what actions it needs to take. What makes one setup feel cutting-edge and another feel frustrating is everything that surrounds the model. A harness needs to guide it before it acts, to correct mechanical errors, and to tell the model exactly what went wrong. It should also remember what has been learned from each attempt and manage the context intelligently. Frontier labs build much of this into post-training and tune their own harnesses to fit the strengths and weaknesses of their specific model. While Dirge cannot change how a model was trained, it can close the performance gap by meeting the model where it is. Once you invest a bit of work in the harness capabilities, a cheaper open model starts to behave like one that costs much more. The gap appears at three different time scales, and Dirge invests in all these cases. Each time the model makes a tool call, it can either succeed or fail. Maybe the call is malformed. Maybe it edits a file and introduces a syntax error. Or maybe it gets stuck retrying the same failing command over and over. In each case, a failed step consumes time and tokens without advancing the task, and these failures quickly accumulate to fill up the context window with noise, leading the model to lose the thread of what it's doing. And the longer a session runs, the worse the model gets at following instructions because as the window nears its limit, earlier instructions and corrections get truncated or forgotten. So the model continues to repeat mistakes or ignore earlier context. Things get even worse across sessions, since each new agent starts with utter ignorance of what went ond before. The model doesn’t remember any past decisions, file structures, or problems you’ve already solved. Every session has to rebuild its understanding of the codebase from scratch. Let's take a look at what Dirge does with each separate piece of the puzzle. The attacks are in a certain order, and each is connected with its neighbor, making it a part of the whole process. As often tends to be the case, the aggregate is more than the sum of its parts. How Dirge works Dirge is essentially a state machine wrapped around the model. It lays down a series of steps, each of which consists of running the model, classifying the reply, then verifying and executing any tool calls, and finally verifying that the job has really been done before allowing the model to stop. This loop in itself is the core plumbing. What makes it a real power multiplier are three layers of apparatus wrapped around it, each of which corresponds to one of the time scales we just discussed. A steering-and-repair layer ensures that each turn lands. A long-horizon layer ensures continuity within a session despite the limitation of the context window’s size. A learning layer transfers hard-won knowledge between sessions which is stored in one SQLite database associated with the project. On top of that sits a plugin system which lets you reach into any part of the...

Making budget models punch above their weight with a smart Rust harness

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews