The Rise of the Agent Runtime — Golem<br>1.6k Get started →
The dominant use of AI in 2026 is a coding agent—even though almost none of the people using AI think of themselves as programmers, and almost none of them ever see a line of code.
This shift is invisible to users, but it is breaking the infrastructure beneath them. Every major vendor is now quietly rebuilding fragments of the same missing layer—a runtime sized for the agent—and none of them ships the whole thing.
In this post, I’ll explain what’s breaking, why everyone is rebuilding the same kernel badly, and what we built Golem to do about it.<br>https://golem.cloud/blog/the-rise-of-the-agent-runtime/
Billions of Programs, Invisibly Written
A salesperson at BBVA starts her day. A customer is on the calendar for eleven. She types one line into ChatGPT Enterprise: pull a one-page summary of this client’s last 90 days, flag anything unusual. Sixty seconds later, the summary is on her screen. She skims it, edits two lines, and walks to the meeting.
What she doesn’t know is that a TypeScript program was just written for her. It pulled in three npm packages it had never seen before. It ran, got edited once, and ran again. She knows only that the summary arrived.
According to OpenAI’s State of Enterprise AI 2025, BBVA “regularly uses more than 4,000 GPTs.” Across the report’s enterprise sample, weekly users of Custom GPTs and Projects grew roughly 19× year-to-date. About one in five enterprise ChatGPT messages now flows through a Custom GPT or Project. Coding-related messages from non-engineering functions are up 36%.
The conventional framing says this is another generation of productivity software. Useful, broad, unremarkable.
The conventional framing is wrong.
What BBVA actually has—without ever intending it—is countless small programs being authored, and re-authored, every single week, on behalf of people who do not think of themselves as programmers. A salesperson asks for a customer summary. An analyst asks for a pricing diff. A marketing lead asks for a brochure assembled from a folder of headshots. In every case, a small program is written to produce the result, and the user receives the result—never the program.
Every one of those programs runs on some infrastructure. Inside some sandbox (or not). Against some pile of data. Under some authority. Journaling an audit trail (or not).
Now multiply that picture by every Fortune 500 in the world.
The real shift isn’t happening in the message traffic above. It’s happening in the substrate beneath. And the substrate was never built for this.
Where the Substrate Is Breaking
The substrate has serious problems, and they span security, isolation, reliability, and governance. Let’s take them one at a time.
Security: The Trifecta Outruns Detective Controls
The MIT NANDA initiative’s 2025 study of generative-AI pilots reported that 95% delivered no measurable return, with brittle integrations and shadow AI both named among the dominant failure modes. The OWASP LLM Top 10 (2025 edition) catalogs the steadily accreting failure modes: prompt injection, sensitive-information disclosure, supply-chain compromise, excessive agency.
Simon Willison gave the structural problem its three-noun summary: the lethal trifecta—an agent that reads attacker-controlled input, holds privileged tools, and can phone home. The whole industry now uses the term, because the whole industry has the problem.
Here’s the uncomfortable part: new universal jailbreaks ship at the model layer every quarter, and detective controls cannot keep up with that cadence. You cannot filter your way out of a structural vulnerability.
Isolation: Too Small, or Too Heavy
Cloudflare’s Workers platform caps each isolate at 128 MB of memory. That’s efficient for stateless request-response, but hopeless for an agent that must hold its own working memory, install dependencies, and run them. In June 2025, Cloudflare added container-class Sandboxes alongside the V8 isolates, reaching general availability in April 2026. Even Cloudflare concluded that lightweight sandboxing alone was not enough.
Anthropic approached the same gap from the other side. The post that open-sourced the Claude Code sandbox describes a runtime designed to give agents an isolation surface without paying a container’s startup and management cost.
Notice what just happened: Cloudflare moved toward heavier isolation by adding containers, and Anthropic moved toward lighter isolation by stripping the sandbox down. They were converging on the same missing runtime tier—from opposite directions.
Reliability: Side Effects Get Re-Issued
Agents now execute side-effecting tools: refunds, emails, payments, travel bookings. The runtime beneath them rarely guarantees those calls are exactly-once. Which means a downed or interrupted agent that is resumed will re-issue the refund.
Inngest’s February 2026 essay Durable Execution: The Key to Harnessing AI Agents in Production—whose central...