Defensible Deep Research from Open-Weight Models — thinkwright
Thinkwright
GitHub ↗
I've been working on a custom harness for myself. Everyone's working on harness engineering today. But while working on mine, I became very interested in making sure it could get deep research right: effective, cheap, and trustworthy. Mostly I wanted to know: can I get the harness to delegate reading and drafting while keeping final claims tied to sources and checked for accuracy?
Architecture first<br>The coordinator stays responsible for source selection, routing, and final judgment.
Cheap middle passes<br>Long reading and drafting move to lower-cost open-weight workers where the work is bounded.
Verification boundary<br>The check can catch unsupported claims when source text is present, but weak source material still has to be labeled.
Parts & Services
Deep-research worker workflow<br>A coordinator fetches, flattens, and tags sources into a markdown pack, fans work out to parallel Gemma compression jobs, merges notes into Nemotron synthesis, verifies source-sensitive claims against fetched text, and ships a report with confidence tiers.
Coordinator<br>Fetch + tag<br>Opus 4.8 / GPT-5.5
Markdown pack<br>Flattened sources<br>URLs + tags
Gemma 4 31B-it FP8<br>Compress<br>shard A
Gemma 4 31B-it FP8<br>Compress<br>shard B
Gemma 4 31B-it FP8<br>Compress<br>shard N
Nemotron 3 Ultra<br>Synthesize<br>550B-A55B NVFP4
Coordinator<br>Verify + ship<br>Opus 4.8 / GPT-5.5
delegate_batch: parallel compression<br>dashed line: check back to sources
Reading and drafting move to open-weight workers; source tags, confidence tiers, and coordinator checks keep the output usable.
The first handoff
The first handoff keeps Gemma out of open-ended research. The coordinator searches, opens pages, and decides what belongs in the source set. The fetched material is flattened into a markdown source pack before it reaches the worker.
Search and select<br>The coordinator handles live web search, source choice, and the first judgment about relevance.
Flatten to markdown<br>Pages become a stable source pack with titles, URLs, source tags, and task boundaries.
Compress with Gemma<br>Gemma gets a bounded reading job: preserve figures, qualifiers, and source tags.
I set it up this way because web search and source judgment are where I want the stronger model. The lower-cost worker gets a source pack and a compression task, not a blank research assignment.
First, just a bit about the harness itself. I think of it as a terminal workbench with six practical pieces.
Coordinator<br>A frontier model owns the conversation, fetches sources, routes work, and makes the final call.
Worker registry<br>Named lower-cost models are available for bounded reading, compression, and drafting jobs.
Delegation tools<br>delegate_worker sends one briefdelegate_batch fans out independent briefs in parallel
Prompt fragments<br>The run starts with instructions about available workers, when to delegate, and what must be verified.
Run log<br>Handoffs and outputs are preserved so I can trace mistakes back to source selection, compression, synthesis, or verification.
Learning layer<br>Reusable procedures are a design goal; no dynamic procedure builder is running in this article.
The run this article is about was a datacenter supply-chain briefing for 2026: transformers, power delivery, interconnection queues, 800 VDC architecture, packaging constraints, and weak signals around solid-state transformers. I checked the report's claims against the fetched source pack and its own confidence table, so the discussion below stays attached to that artifact.
Plumbing is straightforward: the coordinator fetches sources and decides what work can leave its own context, Gemma 4 31B-it compresses source shards in parallel, Nemotron 3 Ultra writes from those compressed notes, and the final pass returns to the coordinator for source-sensitive checks. The check is mostly mechanical: find the source sentence behind the number or claim, then keep, correct, or cut it.
Methodology & Confidence
The datacenter supply-chain report included more than the main analysis. It also included a method note explaining how sources moved through the chain and a confidence table separating corroborated claims from single-source or low-reliability ones. That helps readers see which parts of the analysis are strong and which parts need caution.
01<br>Fetch sources<br>The coordinator gathers live source material and decides how to split the reading.
02<br>Compress notes<br>Gemma preserves figures with subjects, source tags, and reliability hints.
03<br>Synthesize<br>Nemotron writes from supplied notes and is told to keep gaps and hedges visible.
04<br>Verify claims<br>The coordinator checks source-sensitive claims before the report is treated as usable.
The workflow does not make research deterministic. It adds friction where mistakes matter: figures carry source tags, weak evidence is labeled instead of smoothed over, and high-impact claims are checked before the report is treated as...