Defensible Deep Research from Open-Weight Models

oceanwaves1 pts0 comments

Defensible Deep Research from Open-Weight Models — thinkwright

Thinkwright

GitHub ↗

I've been working on a custom harness for myself. Everyone's working on harness engineering today. But while working on mine, I became very interested in making sure it could get deep research right: effective, cheap, and trustworthy. Mostly I wanted to know: can I get the harness to delegate reading and drafting while keeping final claims tied to sources and checked for accuracy?

Architecture first<br>The coordinator stays responsible for source selection, routing, and final judgment.

Cheap middle passes<br>Long reading and drafting move to lower-cost open-weight workers where the work is bounded.

Verification boundary<br>The check can catch unsupported claims when source text is present, but weak source material still has to be labeled.

Parts & Services

Deep-research worker workflow<br>A coordinator fetches, flattens, and tags sources into a markdown pack, fans work out to parallel Gemma compression jobs, merges notes into Nemotron synthesis, verifies source-sensitive claims against fetched text, and ships a report with confidence tiers.

Coordinator<br>Fetch + tag<br>Opus 4.8 / GPT-5.5

Markdown pack<br>Flattened sources<br>URLs + tags

Gemma 4 31B-it FP8<br>Compress<br>shard A

Gemma 4 31B-it FP8<br>Compress<br>shard B

Gemma 4 31B-it FP8<br>Compress<br>shard N

Nemotron 3 Ultra<br>Synthesize<br>550B-A55B NVFP4

Coordinator<br>Verify + ship<br>Opus 4.8 / GPT-5.5

delegate_batch: parallel compression<br>dashed line: check back to sources

Reading and drafting move to open-weight workers; source tags, confidence tiers, and coordinator checks keep the output usable.

The first handoff

The first handoff keeps Gemma out of open-ended research. The coordinator searches, opens pages, and decides what belongs in the source set. The fetched material is flattened into a markdown source pack before it reaches the worker.

Search and select<br>The coordinator handles live web search, source choice, and the first judgment about relevance.

Flatten to markdown<br>Pages become a stable source pack with titles, URLs, source tags, and task boundaries.

Compress with Gemma<br>Gemma gets a bounded reading job: preserve figures, qualifiers, and source tags.

I set it up this way because web search and source judgment are where I want the stronger model. The lower-cost worker gets a source pack and a compression task, not a blank research assignment.

First, just a bit about the harness itself. I think of it as a terminal workbench with six practical pieces.

Coordinator<br>A frontier model owns the conversation, fetches sources, routes work, and makes the final call.

Worker registry<br>Named lower-cost models are available for bounded reading, compression, and drafting jobs.

Delegation tools<br>delegate_worker sends one briefdelegate_batch fans out independent briefs in parallel

Prompt fragments<br>The run starts with instructions about available workers, when to delegate, and what must be verified.

Run log<br>Handoffs and outputs are preserved so I can trace mistakes back to source selection, compression, synthesis, or verification.

Learning layer<br>Reusable procedures are a design goal; no dynamic procedure builder is running in this article.

The run this article is about was a datacenter supply-chain briefing for 2026: transformers, power delivery, interconnection queues, 800 VDC architecture, packaging constraints, and weak signals around solid-state transformers. I checked the report's claims against the fetched source pack and its own confidence table, so the discussion below stays attached to that artifact.

Plumbing is straightforward: the coordinator fetches sources and decides what work can leave its own context, Gemma 4 31B-it compresses source shards in parallel, Nemotron 3 Ultra writes from those compressed notes, and the final pass returns to the coordinator for source-sensitive checks. The check is mostly mechanical: find the source sentence behind the number or claim, then keep, correct, or cut it.

Methodology & Confidence

The datacenter supply-chain report included more than the main analysis. It also included a method note explaining how sources moved through the chain and a confidence table separating corroborated claims from single-source or low-reliability ones. That helps readers see which parts of the analysis are strong and which parts need caution.

01<br>Fetch sources<br>The coordinator gathers live source material and decides how to split the reading.

02<br>Compress notes<br>Gemma preserves figures with subjects, source tags, and reliability hints.

03<br>Synthesize<br>Nemotron writes from supplied notes and is told to keep gaps and hedges visible.

04<br>Verify claims<br>The coordinator checks source-sensitive claims before the report is treated as usable.

The workflow does not make research deterministic. It adds friction where mistakes matter: figures carry source tags, weak evidence is labeled instead of smoothed over, and high-impact claims are checked before the report is treated as...

source coordinator gemma claims sources research

Related Articles