Supervisor Agents Don't Exist Yet

Supervisor Agents Don't Exist Yet - Subho Halder

Subho Halder

SubscribeSign in

Supervisor Agents Don't Exist Yet A definition of the layer that sits inside a main agent's cycle, decides if each proposed action is acceptable, and either lets it through, nudges it back on path, or stops it.

Subho Halder May 21, 2026

This is a thesis post about supervisor agents. A supervisor agent is a separate process that sits inside a main agent’s cycle. Every action the main agent proposes passes through the supervisor before it executes. The supervisor decides whether the action is acceptable, and depending on the verdict either lets it through, nudges the main agent back onto the right path, flags the action for a human, or blocks it outright. The layer doesn’t really exist yet, not in the form it needs to. I wrote this because every time I describe what I mean to someone building an agent, they nod, go back to their team, and ship another foundation LLM with a long system prompt. The vocabulary isn’t landing. The post is the vocabulary. Thanks for reading! Subscribe for free to receive new posts and support my work.

Overview

The shape end-to-end: main agent │ proposes action supervisor (step in the loop) │ fan out to specialists in parallel specialist 1, 2, ... N (regex · SQL · AST · classifier · narrow LLM) aggregation: union, not vote ┌──────┬───┴────┬──────┐ ▼ ▼ ▼ ▼ ok nudge flag block │ │ │ │ execute correction execute refuse; + replan + record main agent (feeds for must replan back to human from scratch main review agent) replay artifact (signed, re-runnable) feedback log accumulation per deployment Four moving parts. A taxonomy that names every known failure mode of the main agent. A specialist per entry in the taxonomy, narrow by construction. A decision layer that fans actions out and aggregates verdicts by union. A feedback log that turns every flag and every nudge into training signal for that specific deployment. Almost everything in this post is variations on those four parts. What it actually is

A main agent is the thing that does the work. It plans, calls tools, edits state, opens PRs, runs commands. It runs between human checkpoints. The interesting main agents in production today are coding agents, support agents, research agents, browsing agents, sales agents, ops agents. They share one property. They do real work without a human watching every action. A supervisor agent is a separate process that sits inside the main agent’s cycle. It observes what the main agent is about to do. It decides whether that action is acceptable. Then it acts on that decision. The action might be ok (let it proceed), nudge (refuse this exact action but send a correction back so the main agent can replan), flag (allow but record for human review), or block (refuse outright and require the main agent to replan from scratch). That’s the whole thing. Watch, decide, act, record. Inside the loop, every cycle. Five properties follow from this definition. They’re what separate a supervisor from the four nearest things people confuse it with. Separation with placement. A supervisor is a separate process. Its own weights, its own memory, its own prompts. But it sits in the main agent’s loop, not outside it. Every proposed action passes through it before execution. The independence is in state. The placement is in the cycle. If the supervisor lives inside the main agent’s reasoning, it isn’t a supervisor. It’s a self-critic, and self-critics fail in correlated ways with the agent they critique. Deployment-time. Evals are good. Evals aren’t supervisors. Evals tell you how the agent did against a golden set last Tuesday. A supervisor tells you what is happening to a customer’s account right now. Most broken behaviour shows up only in deployment, against the real distribution, with real noise. Memory across sessions. A guardrail evaluates one request at a time. A supervisor accumulates. It remembers that this particular main agent tried this particular trick three times this month. It remembers which nudges were heeded and which were ignored. Without accumulation, the supervisor is reset every session, and you’ve built a stateless check, not a supervisor. A taxonomy of failure modes. The supervisor isn’t watching for “anything wrong”. It’s watching for a named, published list of ways this class of main agent is known to fail. Each named failure mode becomes a unit of decomposition. The taxonomy is the foundation of the whole system, and I’ll come back to it. Authority across a graded set of verdicts. A supervisor is not a dashboard. It’s a process that can let an action through, return a correction, flag for human review, refuse outright, or roll a state back. The graded authority is what separates a supervisor from the observability stack. Observability tells you. A supervisor decides, and the decision feeds back into the main agent’s next plan. If a system you’re looking at doesn’t have all five of those, it...

Supervisor Agents Don't Exist Yet

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast