Agent Architecture Is a Compute Allocation Problem: The Advisor Strategy

gzxharrison0011 pts0 comments

Agent Architecture Is a Compute Allocation Problem: The Advisor Strategy, Cost-Curve Frame Recursed | HarrisonSecAgent Architecture Is a Compute Allocation Problem: The Advisor Strategy, Cost-Curve Frame Recursed<br>Anthropic named the advisor strategy in April. Tobi Lutke made it viral in May with Qwen plus GPT-5.5. Stanford's HazyResearch formalized the same shape earlier. One cost-curve frame unifies all three: a cheap executor runs the loop, an expensive advisor weighs in only at hard decisions. The third recursion.

June 15, 2026

Harrison Guo

21 min read

Runtime & Distributed Systems<br>AI Agent Production Engineering<br>Table of Contents

In April 2026, Anthropic published a blog post called &ldquo;The advisor strategy: Give agents an intelligence boost&rdquo;, naming a pattern they had been A/B-testing in production: a cheaper model runs the agent loop end-to-end, an expensive model is consulted only when the cheap one hits a decision it can&rsquo;t solve. They reported concrete numbers — Haiku + Opus advisor on BrowseComp at 41.2% (Haiku alone: 19.7%) at 15% of the cost of running Sonnet through the whole task.<br>On May 18, 2026, Tobi Lutke (CEO of Shopify) tweeted about an autoresearch setup that did exactly this: Qwen 3.6 27B running locally on an RTX 6000, with a small &ldquo;advisor extension&rdquo; that periodically calls GPT-5.5 for direction. 13,000 impressions, 2,400 likes, dozens of replies from engineers reproducing the pattern or building open-source implementations within hours.<br>Underneath both of those, Stanford HazyResearch&rsquo;s Minions paper — published months earlier — had abstracted the same pattern into a compressor-predictor framework: a small local model distills raw context into compact text that a larger remote model then reasons over. They reported their Deep Research system recovering 99% of frontier-model accuracy at 26% of the API cost.<br>Three independent threads converging on the same architecture in roughly the same six-month window. That convergence is the story.<br>This post argues something specific about it: the advisor strategy isn&rsquo;t a new pattern invented for LLMs. It&rsquo;s the third recursion of the cost-curve frame from earlier in this mini-series — the same idea that argued grep beats RAG for code retrieval, and that argued SQLite + FTS5 beats a vector DB for the symbol-graph storage that grep-replacement tools (CodeGraph) need. Applied at the model-orchestration layer, the frame produces the advisor strategy. The strategy is the architecture; the frame is why.<br>tl;dr — Anthropic, Tobi Lutke, and HazyResearch independently shipped (or described) the same agent pattern in early 2026: a cheap model runs the loop, an expensive model is consulted only for decisions. The convergence is evidence the pattern is correct; the reason it&rsquo;s correct is the cost-curve frame from this series&rsquo; first post, applied at the model-choice layer instead of the retrieval-architecture layer. Piece B argued grep+loop beats RAG because build/maintain cost dominates per-query cost below a crossover. The advisor strategy argues the same shape for tokens: cheap-model executor cost dominates expensive-model advisor cost for the bulk of low-value operations (reading context, format conversion, retries), so expensive-model tokens should be spent only at high-value decision points. Same frame, third layer.<br>The post does three things: (1) reports the three converging threads with what each contributed; (2) makes the cost-curve recursion argument explicitly — L1 retrieval, L2 storage, L3 model orchestration; (3) maps the gotchas the hype skips (data egress on handoff, eval difficulty, handoff-contract design as actual engineering, hardware realism). The mini-series concludes here, five posts in, with cost-curve frame as a meta-design law across three layers of agent architecture.

Three convergent threads, in the order they shipped<br>The convergence matters more than any single thread. Each was independent; each shipped within a six-month window of the others; each describes the same architecture from a different vantage. That&rsquo;s how you know the pattern is real and not just one team&rsquo;s design preference.<br>Anthropic&rsquo;s official advisor strategy (2026-04-09)<br>The Anthropic engineering blog &ldquo;The advisor strategy: Give agents an intelligence boost&rdquo; defines the pattern as a productized engineering primitive:<br>&ldquo;Sonnet or Haiku runs the task end-to-end as the executor&mldr; When the executor hits a decision it can&rsquo;t reasonably solve, it consults Opus for guidance as the advisor.&rdquo;

&ldquo;The advisor never calls tools or produces user-facing output, and only provides guidance to the executor.&rdquo;

The reported empirical numbers:<br>ConfigurationBenchmarkScoreCost (relative to Sonnet end-to-end)Sonnet alone (no advisor)SWE-bench Multilingual(baseline)1.00×Sonnet + Opus advisorSWE-bench Multilingualexceeds baseline0.88× (−11.9%)Haiku...

advisor cost model strategy rsquo frame

Related Articles