Training SID-1 to beat GPT-5 at search with 1k+ QPS RL

Training SID-1 to beat GPT-5 at search with 1k+ QPS RL Pin high-QPS namespaces to cacheNEW: Pin namespaces for predictable cost and latency on high QPS workloads

Training SID-1 to beat GPT-5 at search with 1k+ QPS RL May 20, 2026•Max Rumpf (Co-founder of SID), Sam Dauncey (Researcher at SID)guest

Given sufficient search tools and time, humans can find almost anything. We search, read results, adapt, and search again until we find the information we seek.

We're Max and Sam, co-creators of SID-1, an agentic search model that builds upon this idea. As a result of its training, SID-1 nearly doubles recall over classical retrieval pipelines and outperforms frontier LLMs at orders of magnitude lower latency and cost.

SID-1 performance

model recall time per question cost per 1k questions ─────────────── ───────────────────────── ───────────────────────── ─────────────────────────

SID-1 (4x) │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 0.84 │▓ 5.5s │▓ $1.40 GPT-5.1 (high) │░░░░░░░░░░░░░░░░ 0.78 │░░░░░░░░░░░░░░░░ 131s │░░░░░░░░ $240 SID-1 │▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 0.77 │▓ 5.5s │ $0.62 Gemini 3 Pro │░░░░░░░░░░░░░ 0.66 │░░░░░░░░░░░░░░░░░░░ 156s │░░░░ $120 Sonnet 4.5 │░░░░░░░░░░░░░ 0.64 │░░░░░ 35s │░░░░░░░░░░░░░░░░░░ $540 Reranker @10 │░░░░░░░░░ 0.45 │ 0.78s │ $0.61 Vector only @10 │░░░░░░░░ 0.44 │ 0.15s │ $0.0098

source: sid.ai/research/sid-1 SID-1 performance source: sid.ai/research/sid-1

recall latency cost

▓ ░ ░ ▓ ░ ▓ ░ ░ ░ ▓ ░ ▓ ░ ░ ░ ▓ ░ ▓ ░ ░ ░ ░ ░ ▓ ░ ▓ ░ ░ ░ ░ ░ ░ ▓ ░ ▓ ░ ░ ░ ░ ░ ░ ░ ░ ▓ ░ ▓ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▓ ░ ▓ ░ ░ ░ ░ ░ ░ ░ ░ ░ ░ ▓ ░ ▓ ░ ░ ░ ░ ▓ ░ ▓ ░ ░ ▓ ░ ░ ░ ───────────── ───────────── ───────────── A B C D E F G A B C D E F G A B C D E F G

A: SID-1 (4x) B: GPT-5.1 (high) C: SID-1 D: Gemini 3 Pro E: Sonnet 4.5 F: Reranker @10 G: Vector only @10

SID is a research lab for search. We trained SID-1 using large-scale reinforcement learning (RL), and when training became bottlenecked on search latency, we migrated the search backend to turbopuffer. We wrote this post, on invitation from the turbopuffer team, to share how we train SID models using large-scale, synchronous RL rollouts at 1k+ searches per second over 10M+ document corpora across thousands of training steps.

Iterative search > static retrieval

Unlike humans, static retrieval ("RAG") pipelines cannot search iteratively. They run a fixed sequence of steps and return the result, even when it's bad.

static retrieval pipeline

┌──────────┐ ┌─────────────┐ ┌─────────────┐ ┌──────────┐ ┌─────────┐ │ question ├─▶│ LLM rewrite ├─▶│ turbopuffer ├─▶│ reranker ├─▶│ results │ └──────────┘ └─────────────┘ └─────────────┘ └──────────┘ └─────────┘ (optional) (optional) static retrieval pipeline

┌─────────────┐ │ question │ └──────┬──────┘ ┌─────────────┐ │ LLM rewrite │ │ (optional) │ └──────┬──────┘ ┌─────────────┐ │ turbopuffer │ └──────┬──────┘ ┌─────────────┐ │ reranker │ │ (optional) │ └──────┬──────┘ ┌─────────────┐ │ results │ └─────────────┘

The conventional fixes are to add more retrieval steps (LLM query rewrites, hybrid search with rank fusion, reranking) or tweak the embedding model or chunking strategy, often at the cost of engineering time, complexity, and brittleness.

None of these fixes address the underlying problem: Every important decision is hard-coded once, at design time, and applied uniformly to all queries. No fixed set of choices is right for every question, which is why static pipelines often accumulate a long tail of failures.

SID-1 treats search as an iterative process driven by an LLM . It runs over multiple turns, calling tools to gather context until it has enough, then returns a ranked list of documents.

SID-1 retrieval pipeline

┌──────────┐ ┌─────────────┐ ┌─────────┐ │ question ├───▶│ SID-1 ├─ ranked ─▶│ results │ └──────────┘ └┬┬┬───────▲▲▲┘ └─────────┘ │││ │││ tool calls│││ n │││content + BM25, ANN, etc│││ turns │││metadata │││ │││ ┌▼▼▼───────┴┴┴┐ │ turbopuffer │ └─────────────┘ SID-1 retrieval pipeline

┌────────┐ │question│ └───┬────┘ ┌───▼────┐ ┌──────┐ │ SID-1 ╞═══tools═▶│ tpuf │ │n-turns │◀═content═╡ │ └───┬────┘ └──────┘ ranked ┌───▼────┐ │results │ └────────┘

This iterative process corrects the fundamental problem of static retrieval. Every design decision is now made by a model that adapts its approach to each query. Like a human, the model decides which tools to use, how to phrase queries, and when to stop searching. As a result, SID-1 outperforms classical embedding-reranking pipelines on recall.

This is not dissimilar to today's agentic search, where frontier LLMs progressively search and reason over new context. SID-1's training, however, makes it significantly more efficient than frontier LLMs at using search tools and reasoning across results, which is why SID-1 achieves higher recall than much slower and more expensive frontier models with the same expert prompting and harness.

This also makes SID-1 a strong subagent inside a frontier-model-led task. When a frontier model searches directly, every retrieved document...

Training SID-1 to beat GPT-5 at search with 1k+ QPS RL

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast