The Agent swarm that designs itself

kkm1 pts0 comments

The swarm that designs itself | Peter Bhabra

The swarm that designs itself | Peter Bhabra

Faced with a hard task, the instinct is to reach for more: a smarter model, a longer context window, one capable agent that can hold the whole problem in its head at once. The entire frontier is racing along that axis, chasing more intelligence and more context, and it has handed me a great tool. For deep, sequential problems, a single long-context agent is a superb one: the best hammer I’ve ever had.

So I reach for it on everything. When it can’t crack a task, I rarely stop to ask whether a hammer was the right tool. I just wait for a bigger one, the next model with a longer window and a higher benchmark. But hand someone a hammer and everything starts to look like a nail. Some tasks were never nails.

For one shape of work, there is another way. A lot of real work isn’t deep and sequential. It’s wide and shardable: audit every file in this repo, review every dependency, document every subsystem, check every source. Point a single long-context agent at that and it’ll get there, but you will pay dearly for the privilege.

That question is why I built the Doubleword Agent Swarm, my open-source reimplementation of the agent swarm Moonshot introduced in the Kimi K2.5 report: an LLM orchestrator designs its own team of bounded-context workers and fans them out in parallel over a task. This post is the story of how I built it, and what happened when I pointed it at a real codebase, side by side with a single long-context agent.

For wide, shardable work, a swarm of bounded agents beats one long-context agent on cost and on output. The model designs the team. I build the scaffolding, and it’s small.

What the hammer costs

To make it concrete, I picked a job I needed done: a security audit of control-layer, Doubleword’s open-source AI gateway, 512 source files and about 2.4M tokens of unique source. Find real vulnerabilities (injection, leaked secrets, broken auth, unsafe file handling). I ran it both ways, one long-context agent and the swarm.

I ran the single agent first: Claude Opus, a 1M-token window, no chunking, just “audit the repo”. It works. The trouble is what it costs. An agent loop re-sends the growing transcript with every turn, so by the time my metered run had covered ~7% of the repo, it had already burned 27.7M tokens, 95% of them cache reads, the same transcript shipped back again and again.1 Projected over the full repo, the audit lands around 300M tokens : a 2.4M-token codebase, amplified ×125.

Read once · the corpus

2.4M<br>Solo agent · projected

~300M

Reading the 2.4M-token corpus once, against the ~300M a single long-context agent re-sends: a ×125 amplification.

Reading the codebase once costs 2.4M tokens. The remaining ~297M is the agent re-reading what it has already seen.

The alternative: a swarm

Don’t make one agent re-read everything. Split the repo across many bounded workers, each reading only its own slice, once, all at the same time. That’s an agent swarm.

It’s how every company already works. The CEO is the most capable (and most expensive) person in the building, and the wrong one to personally trawl through every file. So they don’t. They hire specialists with tight remits, hand each a bounded task, and never see the mountain of material those specialists wade through. What comes back is a short, high-level summary. The raw work stays with the specialist. The same setup works here, just with agents: the orchestrator plays CEO, the workers are its specialists, and only their conclusions ever travel back up.

Who designs the team?

In February 2026, Moonshot published the Kimi K2.5 technical report.2 Its agent-swarm result is the framework I built on: scale out, not just up. A trainable orchestrator spawns specialised sub-agents and runs them in parallel, trained with PARL (Parallel-Agent Reinforcement Learning), where only the orchestrator learns and the sub-agents stay frozen. The headline numbers: 4.5× lower latency than a single agent, and +17.8 points on BrowseComp.3 What’s new is that the swarm designs itself : decomposition and team width are the model’s call, not a hand-written workflow.

What’s in the weights is only the orchestration instinct: how to decompose, delegate, reconcile. The runtime that makes a swarm real (spawn, isolate, parallelise, aggregate) lives in Moonshot’s hosted product, not in the open weights. An open endpoint gives me what it has always given me: messages and tools in, tool calls and text out.

That gap became the project: the weights bring the instinct, I build the body. doublewordai/swarm is that body, my from-scratch interpretation of Moonshot’s swarm, built on the Open Responses API, model-agnostic (default: moonshotai/Kimi-K2.6).4

The architecture

Everything I kept from the paper, and everything I added, compresses to four principles:

A self-designing orchestrator. The model decides the team and the decomposition, not me.

Bounded local...

agent swarm context designs model single

Related Articles