See, Act, Correct: three levers for working with a code agent

See, Act, Correct: three levers for working with a code agent - Unladen swallow - Olivier Wulveryck

owulveryck's blog

Olivier Wulveryck — 2026-06-04 — https://blog.owulveryck.info/2026/06/04/see-act-correct-three-levers-for-working-with-a-code-agent.html

An out-of-the-box code agent only sees a repo and a shell. For professional engineering, that is not enough. Here are the principles that make the difference between a gadget and a production tool.

Foreword. This article grew out of the talk “Beyond the Basics with Claude Code” by Daisy Holman, an engineer on the Claude Code team (May 2026). The founding ideas come from that talk, then supplemented by my own field experience and supported by documented research to cross-check them against available data.

“If the agent can’t do everything you do, it can’t work with you.” — Daisy Holman, Anthropic

That sentence sums up the problem every team eventually encounters. A freshly installed code agent knows how to read code and run commands. That is enough for a prototype, a side project, a zero-to-one exercise. But real engineering work does not live in source code. It lives in Slack threads, design docs, production dashboards, review discussions, unwritten architecture decisions. The code tells what is done; it rarely tells why.

Try this: spend an entire day in your agent’s terminal, without ever switching to another tool. Every alt-tab is a missing connection, a place where the agent cannot follow you. Customizing an agent is not a luxury. It is a structural necessity.

This article is aimed at senior engineers, tech leads, and architects who are deploying or considering deploying code agents in production environments. It lays out three invariant principles (valid regardless of the tool) and a method for applying them.

Transparency note: this framework was built from extensive experience with Claude Code, then generalized. The mechanism examples (skills, hooks, MCP, prompt files) are borrowed from that ecosystem, but the principles apply to the equivalent primitives of other tools (rules files for Cursor, custom instructions for Copilot, repo maps for Aider). When a recommendation is tool-specific, it is flagged.

1. The mental model: See, Act, Correct

Every code agent, regardless of the engine powering it, boils down to three questions:

SEE : what does the agent know? What sources of information are accessible to it? Code, logs, internal documentation, conversation history, CI state.

ACT : what can the agent do? Edit files, run tests, open PRs, query an API, deploy.

CORRECT : what corrects it? Linters, test failures, review feedback, validation hooks, automatic feedback loops.

This framework aligns with several independent traditions: the OODA cycle (Observe, Orient, Decide, Act) in decision theory, the Observation / Action / Reward paradigm in reinforcement learning, feedback loops in systems engineering. This is no coincidence: these are structural constraints of any system that perceives an environment, acts on it, and adjusts.

This convergence is not perfect. Boyd’s OODA cycle includes an Orientation phase (the step where raw data is interpreted and synthesized) that our framework subsumes under SEE. This choice is deliberate: in production engineering, Orientation is largely the product of context packaging (which information, in what order, in what form). But the cost must be acknowledged: a framework without explicit Orientation tends to treat the agent as a purely reactive system. For tasks requiring deep architectural reasoning, this dimension warrants separate treatment.

Similarly, this framework deliberately differs from the cognitive architectures proposed by academic research. Lilian Weng’s widely cited architecture (“LLM Powered Autonomous Agents,” 2023) divides the agent into Planning, Memory, and Tool Use 1. Shunyu Yao’s ReAct paradigm separates the reasoning trace from the external action. Our framework sacrifices that granularity in favor of applicability: in production, planning and memory are context packaging problems (how much history to inject, in what form, at what point). But if your agent requires self-reflection (the ability to revise its own decisions independently of external correction signals), this dimension merits separate treatment.

The parallel between CORRECT and the reward signal in RL is not just a metaphor. Recent work on reinforcement learning training for code models explicitly uses composite reward functions combining functional correctness (unit tests), syntactic correctness (linters), and semantic structure (data-flow graphs) 2. Your linters and tests are not mere after-the-fact filters; they function as multidimensional reward functions that shape the agent’s generation policy.

Everything else (instruction files, MCP servers, hooks, skills,...

See, Act, Correct: three levers for working with a code agent

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy