Harness engineering for coding agent users

To let coding agents work with less supervision, we need ways to increase our confidence in their result. As software engineers, we have a natural trust barrier with AI-generated code - LLMs are non-deterministic, they don't know our context, and they don't really understand the code, they think in tokens. This article explores a mental model that brings together emerging concepts from context and harness engineering to build that trust.

02 April 2026

Birgitta Böckeler

Birgitta is a Distinguished Engineer and AI-assisted delivery<br>expert at Thoughtworks. She has over 20 years of experience as a software<br>developer, architect and technical leader.

generative AI

Contents

Feedforward and Feedback

Computational vs Inferential

The steering loop

Timing: Keep quality left

Regulation categories

Maintainability harness

Architecture fitness harness

Behaviour harness

Harnessability

Harness templates

The role of the human

A starting point - and open questions

Sidebars

Metaphors only go so far

How does harness engineering relate to context engineering?

Ambient affordances

Ashby's Law

This article updates an earlier memo outlining my first<br>impressions of harness engineering.

The term harness has emerged as a shorthand to mean everything in an AI agent except the model itself - Agent = Model + Harness. That is a very wide definition, and therefore worth narrowing down for common categories of agents. I want to take the liberty here of defining its meaning in the bounded context of using a coding agent. In coding agents, part of the harness is already built in (e.g. via the system prompt, or the chosen code retrieval mechanism, or even a sophisticated orchestration system). But coding agents also provide us, their users, with many features to build an outer harness specifically for our use case and system.

Metaphors only go so far

It has been pointed out to me that wrapping harnesses around harnesses doesn't make sense: “Have you ever tried to put a harness on the inside of a dog?” So this somewhat stretches the metaphor, but I'm happy to accept that if it proves useful to navigate the usage of this word.

Figure 1:<br>The term “harness” means different things depending on the bounded context.

A well-built outer harness serves two goals: it increases the probability that the agent gets it right in the first place, and it provides a feedback loop that self-corrects as many issues as possible before they even reach human eyes. Ultimately it should reduce the review toil and increase the system quality, all with the added benefit of fewer wasted tokens along the way.

Feedforward and Feedback

To harness a coding agent we both anticipate unwanted outputs and try to prevent them, and we put sensors in place to allow the agent to self-correct:

Guides (feedforward controls) - anticipate the agent's behaviour and aim to steer it before it acts. Guides increase the probability that the agent creates good results in the first attempt

Sensors (feedback controls) - observe after the agent acts and help it self-correct. Particularly powerful when they produce signals that are optimised for LLM consumption, e.g. custom linter messages that include instructions for the self-correction - a positive kind of prompt injection.

Separately, you get either an agent that keeps repeating the same mistakes (feedback-only) or an agent that encodes rules but never finds out whether they worked (feed-forward-only).

Computational vs Inferential

There are two execution types of guides and sensors:

Computational - deterministic and fast, run by the CPU. Tests, linters, type checkers, structural analysis. Run in milliseconds to seconds; results are reliable.

Inferential - Semantic analysis, AI code review, “LLM as judge”. Typically run by a GPU or NPU. Slower and more expensive; results are more non-deterministic.

Computational guides increase the probability of good results with deterministic tooling. Computational sensors are cheap and fast enough to run on every change, alongside the agent. Inferential controls are of course more expensive and non-deterministic, but allow us to both provide rich guidance, and add additional semantic judgment. In spite of their non-determinism, inferential sensors can particularly increase our trust when used with a strong model, or rather a model that is suitable to the task at hand.

Examples

DirectionComputational / InferentialExample implementations

Coding conventionsfeedforwardInferentialAGENTS.md, Skills

Instructions how to bootstrap a new projectfeedforwardBothSkill with instructions and a bootstrap script

Code modsfeedforwardComputationalA tool with access to OpenRewrite recipes

Structural testsfeedbackComputationalA pre-commit (or coding agent) hook running ArchUnit tests that check for violations of module boundaries

Instructions how to reviewfeedbackInferentialSkills

How does...

Harness engineering for coding agent users

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

It's Not Just X. It's Y