Ultracoding: The Next Frontier

_jayhack_1 pts0 comments

Ultracoding: the next frontier · Jay HackAll writingThe successor to vibe coding is ultracoding: let agents write code to programmatically spawn copies of themselves. Dynamically spin up multi-agent hierarchies in a task-dependent manner and, in doing so, scale up to previously unheard of tasks. Ride the dragon of exponential productivity.

It feels the like the jump from single-threaded scripts to MapReduce and Spark: fan-out across many, reduce/verify steps, and capable of orders of magnitude higher throughput.

This is what the future of building software is going to look like. Meta-harnesses like Claude workflows are the path to scaling up to massive multi-agent hierarchies capable of a fundamentally new category of tasks, in software and beyond.

Operating at a new scale

This pattern of LLMs recursively invoking themselves has previously demonstrated impressive results on academic benchmarks - see RLMs.

Recently however we've seen several impressive demonstrations in the wild in rapid succession, specifically for large code refactors and 0-1 projects.

Recent massive refactors demonstrated in the wild:

Bun's refactor from Zig to Rust

Monty refactor to subprocess pool

Cursor building a browser from scratch with a swarm of agents

Exact implementation details for the above are light, but we can infer that each was accomplished via a swarm of agents working in parallel, managed by a small number of humans in a custom harness. The commonality is that each task has high test coverage and therefore lends itself to horizontally-scalable "ralph-loops" (now a first-class primitive in tools like Codex's /goal) and human verification.

Code Mode as a Multi-agent Substrate

A key enabler for this emerging pattern is agent proficiency at "code mode" - programmatically invoking tools via code execution.

The latest generation of LLMs are RL'd to operate specifically in this manner. It's a more efficient way to act on the world - it can compose bespoke bulk actions at runtime instead of one tool call at a time and enables agents to effectively assemble their own tools.

Voyager (2023) had GPT-4 write, invoke, and store its own code as the action space, accumulating a library of reusable skills. Read more.<br>This pattern was introduced by Voyager, and Perplexity/Cloudflare/many others have since introduced code mode-oriented interfaces. OpenAI and Anthropic even expose this tool calling method in their APIs via simple config (1, 2).

Historically, multi-agent harnesses have been hard-coded and established an explicit heirarchy of agents with different roles and communication patterns. Ultracoding, like workflows, cedes this territory to the bitter lesson and acknowledges that agents can dynamically determine the best meta-harness at runtime. Infra-wise, this only requires the addition of a "spawn agent" tool within an existing (persistent) code mode execution environment.

This ability, to spin up a harness in a task-dependent manner at runtime, has radically reduced the barrier to entry and means you can realistically chat your way to a massive refactor or ambitious 0-1 project.

Scaling Human-Agent Hierarchies: The UX

Massive multi-agent hierarchies are unlocked from a capabilities perspective - now, the major barrier to widespread adoption is better UX for human in the loop .

As Swyx has noted, the UX patterns for ultracoding are nascent. There's no established way to view/triage incremental outputs; The two patterns that have dominated thus far have been agent lists and Kanban boards, however this is clearly not a terminal state.

Devin Desktop shows a kanban board, Claude is just a list of agents.<br>I think we will imminently move towards a model where the agent expresses a UI for human oversight as part of the meta-harness. This may look like hooking into an existing UI like ClickUp or Linear, or alternatively writing bare HTML in a completely bespoke workflow in case bulk approvals or triage is necessary for the human.

In the fullness of time, agents will effectively dynamically code oversight applications for human orchestrators, directly hooked up to "workflows" and with with bespoke approval and triage flows baked in.

A slice of the meta-harness writes its own oversight UI: a code block queries the run's SQLite results and renders a bespoke triage panel, then blocks on human approval before control returns to the workflow.<br>Ultrawork

I think about agents for general-purpose knowledge work and the analogies to code. From what I know of our customers at ClickUp it's obvious: this same pattern applies to many workstreams that emerge in recruiting, sales, project management, legal services, accounting, etc.

This pattern of dynamic multi-agent hierarchies will wash out over knowledge work more generally. Instead of babysitting chat loops, you spin up a bespoke app for the task on the spot, with a UI built to verify the task in aggregate. The stuff that lives in a spreadsheet today becomes an application the...

code agent agents like human multi

Related Articles