Agent Architecture Is a Compute Allocation Problem: The Advisor Strategy

Agent Architecture Is a Compute Allocation Problem: The Advisor Strategy, Cost-Curve Frame Recursed | HarrisonSecAgent Architecture Is a Compute Allocation Problem: The Advisor Strategy, Cost-Curve Frame Recursed Anthropic named the advisor strategy in April. Tobi Lutke made it viral in May with Qwen plus GPT-5.5. Stanford's HazyResearch formalized the same shape earlier. One cost-curve frame unifies all three: a cheap executor runs the loop, an expensive advisor weighs in only at hard decisions. The third recursion.

June 15, 2026

Harrison Guo

21 min read

Runtime & Distributed Systems AI Agent Production Engineering Table of Contents

In April 2026, Anthropic published a blog post called “The advisor strategy: Give agents an intelligence boost”, naming a pattern they had been A/B-testing in production: a cheaper model runs the agent loop end-to-end, an expensive model is consulted only when the cheap one hits a decision it can’t solve. They reported concrete numbers — Haiku + Opus advisor on BrowseComp at 41.2% (Haiku alone: 19.7%) at 15% of the cost of running Sonnet through the whole task. On May 18, 2026, Tobi Lutke (CEO of Shopify) tweeted about an autoresearch setup that did exactly this: Qwen 3.6 27B running locally on an RTX 6000, with a small “advisor extension” that periodically calls GPT-5.5 for direction. 13,000 impressions, 2,400 likes, dozens of replies from engineers reproducing the pattern or building open-source implementations within hours. Underneath both of those, Stanford HazyResearch’s Minions paper — published months earlier — had abstracted the same pattern into a compressor-predictor framework: a small local model distills raw context into compact text that a larger remote model then reasons over. They reported their Deep Research system recovering 99% of frontier-model accuracy at 26% of the API cost. Three independent threads converging on the same architecture in roughly the same six-month window. That convergence is the story. This post argues something specific about it: the advisor strategy isn’t a new pattern invented for LLMs. It’s the third recursion of the cost-curve frame from earlier in this mini-series — the same idea that argued grep beats RAG for code retrieval, and that argued SQLite + FTS5 beats a vector DB for the symbol-graph storage that grep-replacement tools (CodeGraph) need. Applied at the model-orchestration layer, the frame produces the advisor strategy. The strategy is the architecture; the frame is why. tl;dr — Anthropic, Tobi Lutke, and HazyResearch independently shipped (or described) the same agent pattern in early 2026: a cheap model runs the loop, an expensive model is consulted only for decisions. The convergence is evidence the pattern is correct; the reason it’s correct is the cost-curve frame from this series’ first post, applied at the model-choice layer instead of the retrieval-architecture layer. Piece B argued grep+loop beats RAG because build/maintain cost dominates per-query cost below a crossover. The advisor strategy argues the same shape for tokens: cheap-model executor cost dominates expensive-model advisor cost for the bulk of low-value operations (reading context, format conversion, retries), so expensive-model tokens should be spent only at high-value decision points. Same frame, third layer. The post does three things: (1) reports the three converging threads with what each contributed; (2) makes the cost-curve recursion argument explicitly — L1 retrieval, L2 storage, L3 model orchestration; (3) maps the gotchas the hype skips (data egress on handoff, eval difficulty, handoff-contract design as actual engineering, hardware realism). The mini-series concludes here, five posts in, with cost-curve frame as a meta-design law across three layers of agent architecture.

Three convergent threads, in the order they shipped The convergence matters more than any single thread. Each was independent; each shipped within a six-month window of the others; each describes the same architecture from a different vantage. That’s how you know the pattern is real and not just one team’s design preference. Anthropic’s official advisor strategy (2026-04-09) The Anthropic engineering blog “The advisor strategy: Give agents an intelligence boost” defines the pattern as a productized engineering primitive: “Sonnet or Haiku runs the task end-to-end as the executor&mldr; When the executor hits a decision it can’t reasonably solve, it consults Opus for guidance as the advisor.”

“The advisor never calls tools or produces user-facing output, and only provides guidance to the executor.”

The reported empirical numbers: ConfigurationBenchmarkScoreCost (relative to Sonnet end-to-end)Sonnet alone (no advisor)SWE-bench Multilingual(baseline)1.00×Sonnet + Opus advisorSWE-bench Multilingualexceeds baseline0.88× (−11.9%)Haiku...

Agent Architecture Is a Compute Allocation Problem: The Advisor Strategy

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews