Harness, Scaffold, and the AI Agent Terms Worth Getting Right
Log In<br>Sign Up
Back to Articles
Harness, Scaffold, and the AI Agent Terms Worth Getting Right
Published<br>May 25, 2026<br>Update on GitHub<br>Upvote 33
+27
Sergio Paniego sergiopaniego Follow
Aritra Roy Gosthipaty ariG23498 Follow
When a field evolves quickly, its vocabulary often evolves faster than its shared understanding. Terms start to blur, get reused in different contexts, or become shorthand for ideas that are never fully explained. We are currently seeing this happen in the field of AI Agents, where concepts are getting mixed together, some are renamed, and others are widely used for a few months before quietly disappearing.
This can be overwhelming for newcomers, and even for practitioners trying to keep up with the latest developments. After ICLR 2026, one of us (@ariG23498) posted a question that captured this confusion well:
"What do you mean by the terms 'harness' and 'scaffold' in the context of agents? I have heard a lot of explanations while I was at ICLR, but I could not understand why they did not converge to a single explanation."
This glossary is our attempt to ground the terms that keep coming up without clear, consistent explanations. It is not meant to be a comprehensive dictionary of every term in the field. Instead, we focus on the concepts that are often mixed up, reused in different ways, or assumed to be obvious when they are not.
Most of these terms come up whether you're building an agent, deploying one, or just using tools like Claude Code, Codex, or Hermes Agent. The last section covers concepts specific to training models, which is more relevant if you work on that side of things.
Many of these terms don't have universally accepted definitions yet, and different frameworks use the same word differently. The goal here is not to enforce one correct vocabulary, but to provide a practical mental model that makes discussions easier to follow.
Let's get started.
Table of Contents
Model
Scaffolding
Harness
Agent
Context Engineering
Policy
Tool Use
Skills
Sub-agents
Training<br>RL Environment
Trainer
Rollout
Reward
Learn More
Model
The model is the LLM: it takes text in and produces text out (e.g., Claude, Qwen, GPT, Kimi, DeepSeek…). On its own, it has no memory between calls, and no loop. The model can express the intent to call a tool, but it needs a harness to actually execute it. It answers one prompt and stops. Wrap it in scaffolding and a harness and it becomes an agent.
Scaffolding
The behavior-defining layer around the model: system prompt, tool descriptions, how the model's responses get parsed, what it remembers across steps (context management). It shapes how the model sees the world and acts in it, whether during training or at inference.
Products like Claude Code, Codex, and Antigravity CLI call the whole thing a harness. Claude Code's own docs say it directly: "Claude Code serves as the agentic harness around Claude." That's the broad use: harness means everything that isn't the model. The scaffold/harness distinction matters most when you need to reason about them separately, as in a training pipeline. You'll also hear "scaffold" used more broadly to cover any infrastructure the harness relies on: hooks, runtime configuration, even directory structure.
Some products like Claude Code and Codex are tightly coupled to their provider's models. Others like Antigravity CLI and Hermes Agent let you plug in any model.
Harness
The execution layer inside the agent: it calls the model, handles its tool calls, decides when to stop. The harness is what makes the agent run. Scaffolding, defined above, is what the model works from: its instructions, its tools, its format.
Harness engineering is the discipline of designing this layer well: deciding when the agent should stop, how errors get handled, and what guardrails keep it on track. It applies at both training and inference. Addy Osmani's piece and OpenAI's account of building with Codex both cover this from the inference side.
At evaluation time, the same pattern shows up as an eval harness : instead of collecting training data, it runs a fixed set of scenarios at a model checkpoint and records metrics rather than updating weights.
Agent
The term comes from reinforcement learning, where an agent is simply a function that takes an observation and returns an action. The environment takes that action and returns a new observation, and the loop repeats. That loop is still at the core of how LLM agents work.
In the LLM world, the term has expanded. An agent is a model plus everything around it that lets it act, not just respond. It turns raw text generation into something that can act in a loop: taking in information, deciding what to do, and acting on the results.
Take a coding agent as a concrete example. The system prompt, tool descriptions, and the output format the model follows form the scaffolding. The loop that calls the model,...