Training SID-1 to beat GPT-5 at search with 1k+ QPS RL<br>Pin high-QPS namespaces to cacheNEW: Pin namespaces for predictable cost and latency on high QPS workloads
Training SID-1 to beat GPT-5 at search with 1k+ QPS RL<br>May 20, 2026•Max Rumpf (Co-founder of SID), Sam Dauncey (Researcher at SID)guest
Given sufficient search tools and time, humans can find almost anything. We<br>search, read results, adapt, and search again until we find the information we<br>seek.
We're Max and Sam, co-creators of<br>SID-1, an agentic search model<br>that builds upon this idea. As a result of its training, SID-1 nearly doubles<br>recall over classical retrieval pipelines and outperforms frontier LLMs at<br>orders of magnitude lower latency and cost.
SID-1 performance
model recall time per question cost per 1k questions<br>─────────────── ───────────────────────── ───────────────────────── ─────────────────────────
SID-1 (4x) │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 0.84 │▓ 5.5s │▓ $1.40<br>GPT-5.1 (high) │░░░░░░░░░░░░░░░░ 0.78 │░░░░░░░░░░░░░░░░ 131s │░░░░░░░░ $240<br>SID-1 │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 0.77 │▓ 5.5s │ $0.62<br>Gemini 3 Pro │░░░░░░░░░░░░░ 0.66 │░░░░░░░░░░░░░░░░░░░ 156s │░░░░ $120<br>Sonnet 4.5 │░░░░░░░░░░░░░ 0.64 │░░░░░ 35s │░░░░░░░░░░░░░░░░░░ $540<br>Reranker @10 │░░░░░░░░░ 0.45 │ 0.78s │ $0.61<br>Vector only @10 │░░░░░░░░ 0.44 │ 0.15s │ $0.0098
source: sid.ai/research/sid-1<br>SID-1 performance<br>source: sid.ai/research/sid-1
recall latency cost
▓ ░ ░<br>▓ ░ ▓ ░ ░ ░<br>▓ ░ ▓ ░ ░ ░<br>▓ ░ ▓ ░ ░ ░ ░ ░<br>▓ ░ ▓ ░ ░ ░ ░ ░ ░<br>▓ ░ ▓ ░ ░ ░ ░ ░ ░ ░ ░<br>▓ ░ ▓ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░<br>▓ ░ ▓ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░<br>▓ ░ ▓ ░ ░ ░ ░ ▓ ░ ▓ ░ ░ ▓ ░ ░ ░<br>───────────── ───────────── ─────────────<br>A B C D E F G A B C D E F G A B C D E F G
A: SID-1 (4x)<br>B: GPT-5.1 (high)<br>C: SID-1<br>D: Gemini 3 Pro<br>E: Sonnet 4.5<br>F: Reranker @10<br>G: Vector only @10
SID is a research lab for search. We trained SID-1 using<br>large-scale reinforcement learning (RL), and when training became bottlenecked<br>on search latency, we migrated the search backend to turbopuffer. We wrote this<br>post, on invitation from the turbopuffer team, to share how we train SID models<br>using large-scale, synchronous RL rollouts at 1k+ searches per second over 10M+<br>document corpora across thousands of training steps.
Iterative search > static retrieval
Unlike humans, static retrieval ("RAG") pipelines cannot search iteratively.<br>They run a fixed sequence of steps and return the result, even when it's bad.
static retrieval pipeline
┌──────────┐ ┌─────────────┐ ┌─────────────┐ ┌──────────┐ ┌─────────┐<br>│ question ├─▶│ LLM rewrite ├─▶│ turbopuffer ├─▶│ reranker ├─▶│ results │<br>└──────────┘ └─────────────┘ └─────────────┘ └──────────┘ └─────────┘<br>(optional) (optional)<br>static retrieval pipeline
┌─────────────┐<br>│ question │<br>└──────┬──────┘<br>┌─────────────┐<br>│ LLM rewrite │<br>│ (optional) │<br>└──────┬──────┘<br>┌─────────────┐<br>│ turbopuffer │<br>└──────┬──────┘<br>┌─────────────┐<br>│ reranker │<br>│ (optional) │<br>└──────┬──────┘<br>┌─────────────┐<br>│ results │<br>└─────────────┘
The conventional fixes are to add more retrieval steps (LLM query rewrites,<br>hybrid search with rank fusion, reranking) or tweak the embedding model or<br>chunking strategy, often at the cost of engineering time, complexity, and<br>brittleness.
None of these fixes address the underlying problem: Every important decision is<br>hard-coded once, at design time, and applied uniformly to all queries. No fixed<br>set of choices is right for every question, which is why static pipelines often<br>accumulate a long tail of failures.
SID-1 treats search as an iterative process driven by an LLM . It runs over<br>multiple turns, calling tools to gather context until it has enough, then<br>returns a ranked list of documents.
SID-1 retrieval pipeline
┌──────────┐ ┌─────────────┐ ┌─────────┐<br>│ question ├───▶│ SID-1 ├─ ranked ─▶│ results │<br>└──────────┘ └┬┬┬───────▲▲▲┘ └─────────┘<br>│││ │││<br>tool calls│││ n │││content +<br>BM25, ANN, etc│││ turns │││metadata<br>│││ │││<br>┌▼▼▼───────┴┴┴┐<br>│ turbopuffer │<br>└─────────────┘<br>SID-1 retrieval pipeline
┌────────┐<br>│question│<br>└───┬────┘<br>┌───▼────┐ ┌──────┐<br>│ SID-1 ╞═══tools═▶│ tpuf │<br>│n-turns │◀═content═╡ │<br>└───┬────┘ └──────┘<br>ranked<br>┌───▼────┐<br>│results │<br>└────────┘
This iterative process corrects the fundamental problem of static retrieval.<br>Every design decision is now made by a model that adapts its approach to each<br>query. Like a human, the model decides which tools to use, how to phrase<br>queries, and when to stop searching. As a result, SID-1 outperforms classical<br>embedding-reranking pipelines on recall.
This is not dissimilar to today's agentic search, where frontier LLMs<br>progressively search and reason over new context. SID-1's training, however,<br>makes it significantly more efficient than frontier LLMs at using search tools<br>and reasoning across results, which is why SID-1 achieves higher recall than<br>much slower and more expensive frontier models with the same expert prompting<br>and harness.
This also makes SID-1 a strong subagent inside a frontier-model-led task. When a<br>frontier model searches directly, every retrieved document...