Defensible Deep Research from Open-Weight Models

Defensible Deep Research from Open-Weight Models — thinkwright

Thinkwright

GitHub ↗

I've been working on a custom harness for myself. Everyone's working on harness engineering today. But while working on mine, I became very interested in making sure it could get deep research right: effective, cheap, and trustworthy. Mostly I wanted to know: can I get the harness to delegate reading and drafting while keeping final claims tied to sources and checked for accuracy?

Architecture first The coordinator stays responsible for source selection, routing, and final judgment.

Cheap middle passes Long reading and drafting move to lower-cost open-weight workers where the work is bounded.

Verification boundary The check can catch unsupported claims when source text is present, but weak source material still has to be labeled.

Parts & Services

Deep-research worker workflow A coordinator fetches, flattens, and tags sources into a markdown pack, fans work out to parallel Gemma compression jobs, merges notes into Nemotron synthesis, verifies source-sensitive claims against fetched text, and ships a report with confidence tiers.

Coordinator Fetch + tag Opus 4.8 / GPT-5.5

Markdown pack Flattened sources URLs + tags

Gemma 4 31B-it FP8 Compress shard A

Gemma 4 31B-it FP8 Compress shard B

Gemma 4 31B-it FP8 Compress shard N

Nemotron 3 Ultra Synthesize 550B-A55B NVFP4

Coordinator Verify + ship Opus 4.8 / GPT-5.5

delegate_batch: parallel compression dashed line: check back to sources

Reading and drafting move to open-weight workers; source tags, confidence tiers, and coordinator checks keep the output usable.

The first handoff

The first handoff keeps Gemma out of open-ended research. The coordinator searches, opens pages, and decides what belongs in the source set. The fetched material is flattened into a markdown source pack before it reaches the worker.

Search and select The coordinator handles live web search, source choice, and the first judgment about relevance.

Flatten to markdown Pages become a stable source pack with titles, URLs, source tags, and task boundaries.

Compress with Gemma Gemma gets a bounded reading job: preserve figures, qualifiers, and source tags.

I set it up this way because web search and source judgment are where I want the stronger model. The lower-cost worker gets a source pack and a compression task, not a blank research assignment.

First, just a bit about the harness itself. I think of it as a terminal workbench with six practical pieces.

Coordinator A frontier model owns the conversation, fetches sources, routes work, and makes the final call.

Worker registry Named lower-cost models are available for bounded reading, compression, and drafting jobs.

Delegation tools delegate_worker sends one briefdelegate_batch fans out independent briefs in parallel

Prompt fragments The run starts with instructions about available workers, when to delegate, and what must be verified.

Run log Handoffs and outputs are preserved so I can trace mistakes back to source selection, compression, synthesis, or verification.

Learning layer Reusable procedures are a design goal; no dynamic procedure builder is running in this article.

The run this article is about was a datacenter supply-chain briefing for 2026: transformers, power delivery, interconnection queues, 800 VDC architecture, packaging constraints, and weak signals around solid-state transformers. I checked the report's claims against the fetched source pack and its own confidence table, so the discussion below stays attached to that artifact.

Plumbing is straightforward: the coordinator fetches sources and decides what work can leave its own context, Gemma 4 31B-it compresses source shards in parallel, Nemotron 3 Ultra writes from those compressed notes, and the final pass returns to the coordinator for source-sensitive checks. The check is mostly mechanical: find the source sentence behind the number or claim, then keep, correct, or cut it.

Methodology & Confidence

The datacenter supply-chain report included more than the main analysis. It also included a method note explaining how sources moved through the chain and a confidence table separating corroborated claims from single-source or low-reliability ones. That helps readers see which parts of the analysis are strong and which parts need caution.

01 Fetch sources The coordinator gathers live source material and decides how to split the reading.

02 Compress notes Gemma preserves figures with subjects, source tags, and reliability hints.

03 Synthesize Nemotron writes from supplied notes and is told to keep gaps and hedges visible.

04 Verify claims The coordinator checks source-sensitive claims before the report is treated as usable.

The workflow does not make research deterministic. It adds friction where mistakes matter: figures carry source tags, weak evidence is labeled instead of smoothed over, and high-impact claims are checked before the report is treated as...

Defensible Deep Research from Open-Weight Models

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

It's Not Just X. It's Y