An autopsy of Claude Code's deep research

An autopsy of Claude Code's deep research - Steel | Open-source Headless Browser API

')">

Sessions API

Pricing

Blog

Docs

We're hiring!

7.1K

← Back to Blog

An autopsy of Claude Code's deep research

Jun 10, 2026

San Francisco

Nikola Balic

I had Claude Code pry its own deep-research workflow out of its binary, then pointed that workflow at a question about itself. The verdict: it searches wide and never doubles back. The systems it resembles do. The whole game is that second hop. Claude Code ships as a single compiled binary with its brain welded shut. Somewhere inside is the workflow it runs when you ask it to research the web: scope the question, fan out searches, read, vote, write. It does not ship as a readable file, so, for research purposes, I had Claude Code reconstruct the workflow from inside its own binary. A few minutes later the tool had performed surgery on itself and handed me the organ. I had a theory: these "deep research" agents are not deep, they are wide. They take your question, spray it across a handful of parallel searches, pile up the results, and stamp the word deep on the box. So I pointed the extracted workflow at one question, how do deep-research harnesses actually work, are they wide or deep, and let it research its own autopsy. What I watched it do confirmed the theory and broke it in the same run. What I was actually holding This is a Dynamic workflow, the kind Claude Code runs when you ask it to research the web. It executes on your own machine every time you run /deep-research. Everything in this piece describes the workflow as it ships in Claude Code v2.1.170; later builds may change it. The header comment says it was "ported from a bughunter architecture," swapping git and grep for WebSearch and WebFetch. Someone built a bug-hunting agent, then noticed the same skeleton finds facts about the world as easily as it finds null-pointer dereferences. Reference code is how patterns spread. People read it to learn the shape, then they ship the shape. So what matters is less what it does than what it teaches everyone who copies it. How the deep-research workflow works Five phases. Top to bottom. Once. Phase What it does Scope One agent splits the question into 5 angles: broad, technical, recent, contrarian, practitioner. Search Five agents run in parallel, one per angle, each blind to the others. Fetch + extract Dedup URLs, cap at 15 sources. Each source yields 2-5 falsifiable claims, each with a direct quote and a source-quality grade. Verify Three skeptics per claim, each told to refute it. Two rejections out of three and the claim dies. Synthesize One agent merges survivors, ranks by confidence, writes the report with a note listing what got killed. The extraction step does not trust a webpage; it demands a checkable statement plus the quote that backs it. Verification is adversarial on purpose. If you wanted to teach someone how a research agent hangs together, you could do far worse than handing them this file. Good skeleton. The thing I couldn't stop looking at Then I read the prompt the harness hands each searcher. Every searcher is told to rank its results by relevance to the original question. The instruction is verbatim: "Rank by relevance to the ORIGINAL question, not just the search query." Every one of them starts from the same prompt the scoping agent wrote at second zero. Nothing any searcher finds ever changes what gets searched. No agent reads a result, feels the tug of wait, that implies something, and forms a sharper question from it. The orchestrator does not loop. Scope happens once. Search happens once. The report is built from whatever that single sweep dragged up off the seabed. That is the gap between wide and deep, and it fits in one picture. Think about how you actually research something that matters. You search, you read, and the reading rewrites the next question. The answer to hop one becomes the input to hop two. A genealogist hits a misspelled surname in a parish record and that misspelling becomes the next query. A reporter notices the dates don't line up and chases the dates. Nobody writes five queries in advance and stops. The reference harness cannot follow a thread. That is the missing second hop. Then I checked whether the big hosted products do the same thing under nicer branding. Where my theory fell over They don't. I read their own engineering writeups, and almost every serious one is a hybrid: parallel fan-out inside a round, genuine iteration across rounds. Anthropic's multi-agent research system looks like the most wide-open thing in the field: a lead agent spinning up three to five subagents in parallel rather than serially. But read the next sentence. The lead agent "synthesizes these results and decides whether more research is needed, and if so, it can create additional subagents or refine its strategy." That is a loop. Even the...

An autopsy of Claude Code's deep research

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs