The Orchestration Tax

AddyOsmani.com - The Orchestration Tax

Starting more agents is easy now. However, more agents running doesn’t mean more of you available - your cognitive bandwidth doesn’t parallelize. All the judgement to actually steer them and merge the code they write into the codebase still has to route through exactly one serial processor which is just you. Orchestration tax is basically the price you pay for forgetting this and the only real fix is to start architecting your own attention like you architect any concurrent system.

I was in a panel at Google I/O this week with Richard Seroter, Aja Hammerly and Ciera Jaspan talking about what software engineering looks like right now and how it will probably evolve. Near the end Richard asked us what is one thing developers should walk away and do differently. I said the thing I been circling around for months: feeling busy is definetly not the same as being productive . You can run 20 agents and feel completely busy. But thats not 20 agents worth of shipped work.

Earlier in that chat Richard gave this problem a name. “You talked about the orchestration tax” he said. “You can’t manage twenty agents successfully in your own brain.” He is totally right. I want to breakdown this idea properly because its not a discipline problem. It is an architecture problem.

The line from the panel I keep thinking about is something I said almost randomly: running multiple agents does not mean there is more of you.

The asymmetry people don’t price in

There is this hidden asymmetry in agentic workflows. Starting an agent is very cheap. It is just a keystroke or a sentence prompt. But closing the loop on the agent is not cheap at all. Someone has to check if what came back is correct and reconcile it with whatever the other agents touched. That someone is you. And there is exactly one of you.

I wrote about a piece of this last month in Your parallel Agent limit, mostly about the ambient anxiety of not knowing which paralell thread is quietly failing. This post is about the actual shape underneath that cost. When you start seeing agent development as a concurrent system, you realize the human is just a component inside it. The slow serial component.

You are the single thread resource

If you ever wrote concurrent code you already have the right intuition. You just been pointing it at the wrong part of the system.

Python has the Global Interpreter Lock (GIL). You can spawn as many threads as you want but only one executes python bytecode at a time because they must acquire the lock. You are the GIL of your AI agents. They all can run at once. But when any of their work needs genuine understanding of the architecture or resolving merge conflicts, that work has to acquire the lock. There is one lock. You hold it.

Amdahl’s Law makes this very precise. The speedup you get from paralellizing is capped by the fraction of work that stays serial. If a big chunk of your pipeline cant be paralellized, you top out at a hard limit no matter how many cores you throw at it. In agent development the serial fraction is the judgement. Spawning 8 agents doesn’t speed up your judgement time. It just makes the queue of things feeding into it much deeper.

This is an old performance engineering fact that still surprise people: optimizing the non bottleneck part doesn’t increase throughput. You just grow the pile of unfinished work sitting in front of the bottleneck. Adding agents optimize the part that was never the constraint. The constraint is the review step and the throughput of your system equals exactly the throughput of that step. The orchestration tax is the structural gap between agent production and what you can actually merge. It’s what happens when you put a single-threaded resource in charge of a concurrent one.

Grinding won’t fix structural limits

At the panel I said I never felt more productive with my tools but I am also more tired than I ever been. Both halves are completely real and they have the same cause.

The tiredness has a very specific cause. It is how running a serial processor at 100% with no slack feels like. Everytime you check on an agent you been away from you pay a context switch cost. You flush your brain and reload a different context from cold. CPUs do this in microseconds and architects still work hard to avoid it. You do it in minutes and you never reload the context perfectly. Five agents is not 1x workload done five times. It is 5 cold reloads plus a background brain process constantly worrying about which agent you should be checking.

You can’t just try harder to fix a structural limit. The tax will be paid anyway. If you try to grind it out, the limit just shows up as shallow code reviews or experiencing cognitive surrender where you just accept the agent’s code because forming your own opinion costs attention you don’t have anymore. You either pay the tax deliberately or you let it quietly destroy your understanding of your own system.

Architect your...

The Orchestration Tax

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan