SubQ – a sub-quadratic LLM built for multi-million token reasoning

Subquadratic — Efficiency is Intelligence Contact salesRequest early access →

The first modelbuilt for long‑context tasks SubQ is a sub-quadratic LLM built for multi-million token reasoning, allowing agents to work across full repositories, long histories, and persistent state without quality loss. Request early access →

Use Cases All your context. Always available. Reason across millions of tokens in one prompt: entire repos, whole artifacts, and long-running agent state, with room to spare at a fraction of the cost. Tokens012M Python source code The entire 3.13 standard library

~5.1M

Six months of React PRs ~1,050 pull requests against the React codebase

~7.5M

~ Approximate token counts.

Architecture Not just another model.An architectural breakthrough. SubQ is the first model built on a fully sub-quadratic sparse-attention architecture. LLMs today waste compute by processing every possible relationship between words, but only a small fraction of these relationships matter. SubQ finds and focuses only on those, ensuring compute is used where it matters most. At 12M tokens, this reduces attention compute almost 1,000×, changing the way LLMs scale. Technical report →

TransformerO(n²) SubQO(n)

Benchmarks A leader in long-context retrieval and reasoning tasks Long context retrieval SubQ has near-perfect performance on single-fact retrieval and multi-task retrieval, both at scale. Multi-task retrievalRULER (128K) 99.12%

128K

Single-fact retrievalNeedle-in-a-haystack (1M–12M) 100%

100%

98%

12M

Reasoning & knowledge SubQ balances long-context retrieval without compromising on reasoning and knowledge. BenchmarkSubQ 1.1 SmallGPT-5.5Opus 4.8Sonnet 4.6GPT-5.4-miniGPT-5.4-nanoHaiku 4.5Graduate-level science GPQA Diamond · pass@1 85.493.29287.587.581.767.2Agentic finance AutomationBench 13%18%16%8%0%n/r3%Competitive programming LiveCodeBench v6 · pass@4 89.79292.288.978.678.269.7 n/r = result not reported by the model provider

Unrivaled efficiency SubQ uses 64.5x less compute than dense attention, and is 56× faster than FlashAttention-2 at 1M-token context.

Third-party validated results →Technical report →

Products Two ways to use SubQ. The full-context API for developers and enterprise teams. Process full repositories and pipeline states in a single API call at linear cost. → 12M token context window → Streaming + tool use → OpenAI-compatible endpoints Request API access →

The long-context layer for coding agents. Plug into Claude Code, Codex, and Cursor to map codebases, gather context, and answer token-heavy questions faster. → Auto-redirects expensive model turns → One-line install Request SubQ Code access →

Research From the lab.

ResearchJune 16, 2026 Introducing SubQ 1.1 Small Read more → PartnershipsMay 14, 2026 We're Partnering with LayerLens to Evaluate SubQ Read more → ProductMay 5, 2026 Introducing SubQ: The First Fully Subquadratic LLM Read more → ResearchUpdated May 15, 2026 How SSA Makes Long Context Practical Read more →

About We built the architecture the industry said wasn't possible. Subquadratic is a frontier AI research and infrastructure company building a new class of LLMs. While other major labs focus on incremental improvements to Transformer models, we're pushing foundational change at the model architecture level — enabling large-context, multi-modal inference that scales efficiently where transformers can't. Built by researchers from Meta Google Oxford Cambridge BYU

Early Access Is your business ready? Build with us. Join the private preview.

Send MessageBY SUBMITTING THIS FORM, YOU AGREE TO OUR PRIVACY POLICY AND CONSENT TO RECEIVE MARKETING COMMUNICATIONS FROM SUBQUADRATIC. YOU CAN UNSUBSCRIBE AT ANY TIME.

SubQ – a sub-quadratic LLM built for multi-million token reasoning

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews

Britain Became as Poor as Mississippi