The Agent Harness: Runtime, Not Prompt Engineering, Defines Production Agents

The Agent Harness: Why Runtime Control, Not Prompt Engineering, Defines Production Agents (translation) — Guibai ← Back to the summary --> Google Translate is unavailable. The original on juejin.cn is authoritative. --> The Agent Harness: Why Runtime Control, Not Prompt Engineering, Defines Production Agents

agent harness is the runtime control system that wraps the model. It is responsible for context assembly, tool exposure, permission checks, loop control, state persistence, observation processing, UI/audit projection, trace recording, and final output constraints.

People who truly understand harness don't focus on "how to make the model act more like a certain role"; they focus on:

Which things can be left to the model's judgment?

Which things must be enforced by code?

Where do the tools and context the model sees come from?

How do tool results become material for the next round of reasoning?

When should the loop continue, and when should it stop?

Can the final answer be traced back to evidence in the run trace?

If someone understands an agent mainly as "one prompt plus a few tools," they are usually still at the application layer.

If they can break an agent down into runtime state, tool surface, permission policy, observation, loop controller, projection, trace, and output contract, then they have entered the harness layer.

1. The Quickest Test: Ask Them About the Data Flow of a Single Turn

You can ask directly:

After a user sends a task, what happens from input to final answer?

A relatively complete answer should be close to the following chain:

User Input -> Intent/Context Assembly -> Prompt Compiler -> Tool Surface Resolver -> Model Call -> Tool Call -> Permission Check -> Tool Execution -> Raw Tool Result -> Validation/Sanitization -> Observation -> Loop Controller / Stop Policy -> Projection / Trace -> Final Answer

This isn't about memorizing terminology; it's about seeing whether they have built a runtime mental model.

If their answer is:

User Input -> Assemble Prompt -> Call Model -> Model Calls Tool -> Return Answer

This only shows they know the general flow, but haven't yet grasped the critical boundaries of a harness.

2. What Each Layer Specifically Does

1. User Input: Not Fed Directly to the Model

User input is the task entry point, but it cannot become the entire context as-is.

The harness must first determine:

Is this a general Q&A, troubleshooting, code modification, approval response, or long-task recovery?

Is it associated with an existing session, case, incident, host, repo, file, or environment?

Does historical context need to be loaded?

Does a specific runtime profile need to be triggered?

Are there security risks or permission boundaries?

For example:

User: Check why payment-api has been returning 500 for the last 10 minutes.

The harness shouldn't just send this sentence to the model. It should construct a structured task:

"intent": "diagnose_service_error", "service": "payment-api", "time_range": "last_10m", "risk": "read_only", "expected_output": ["symptom", "impact", "likely_cause", "evidence", "next_steps"]

The key at this stage is: transforming natural language into a runtime-manageable task framework.

2. Intent/Context Assembly: Deciding What Context This Turn Should Carry

Intent/context assembly is the context assembly layer.

It decides:

What type is the current task?

Which business contexts should be loaded?

Which system states should be injected?

Which historical messages are still relevant?

Which evidence or artifacts should enter the model?

Which content stays only in the trace and does not enter the model context?

For example, in an SRE RCA scenario, it might assemble:

- service: payment-api - environment: prod - time range: last 10m - known dependencies: db-primary, redis-cache - recent incidents: none - allowed action level: read-only

Someone who understands harness knows: more context is not always better. The goal of context assembly is: enough to complete the task, without polluting the model, blowing up the context window, or leaking unauthorized information.

3. Prompt Compiler: Compiling Runtime State into Model Input

The prompt compiler is not simple string concatenation; it compiles multiple layers of information into the input the model actually sees.

It typically includes:

system/developer rules

agent role/profile

task-specific instructions

dynamic context

tool usage policy

output contract

previous observations

constraints and budgets

For example:

System: You are a controlled SRE RCA agent. Developer: All dangerous operations must pass an approval gate. Task: Diagnose the 500 errors on payment-api in the last 10 minutes. Context: service=payment-api, env=prod, time_range=last_10m. Output contract: Must output symptom, impact, evidence, likely cause, next steps.

Those who truly understand will distinguish:

The prompt is responsible for guiding model behavior; the runtime is responsible for enforcing...

The Agent Harness: Runtime, Not Prompt Engineering, Defines Production Agents

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI