Why GPU compilers are MORE important in the agentic era

msond2 pts0 comments

The brain still needs the hammer: Why compilers matter MORE in the agent era, not less | SCALE - Accelerate your CUDA Development

SolutionsRead our BlogInstall SCALEDeveloper Portal

Back to all postsThe brain still needs the hammer: Why compilers matter MORE in the agent era, not less<br>Michael Søndergaard<br>Tue May 19 2026

Part 2 of a series on why Spectral and SCALE exists.

In Part 1, I argued that cross-vendor portability in accelerated computing must be delivered by a company, rather than a committee, because the implementation is the standard. A reasonable reader would finish that argument by asking the obvious follow-up: fine, but what happens when the implementation gets written by an LLM?

It's a timely question. If agents can write CUDA, the skeptic says, then who needs a CUDA toolchain that runs everywhere? Just point the model at each backend and have it emit native code. CUDA here, ROCm there, SYCL there, etc. The compiler becomes a quaint historical artifact — the assembly programmer of the 2030s.

Let us take that view seriously, because at first hearing it sounds like it dissolves the whole thesis behind Spectral. It doesn't. The opposite, in fact: agentic code generation makes the compiler and runtime stack more valuable, not less. Here's why.

LLMs and compilers do categorically different jobs

A compiler is a deterministic function from source to machine code. Same input, same output, every time, forever. That property is not incidental. It's the entire reason you can trust a billion lines of code running on hardware you've never seen.

An LLM is a probabilistic function from context to plausible tokens. Same input, different output, depending on temperature, sampling, model version, and the phase of the moon. That property is also not incidental — it's what makes the model useful for ideation, exploration, and synthesis. But it is exactly the wrong property at the metal layer.

Nobody wants merely stochastic correctness on FMAs, memory fences, or atomics. Nobody wants a kernel which produces subtly different results across runs because the model got creative about which warp did the reduction. Think of the LLM as the brain and the compiler as the hammer: the brain decides what to compute, the hammer forges the runtime executable, the same way, every time.

These two things look superficially similar — both produce code — but they sit on opposite sides of a categorical divide. Conflating them is the error.

What agents actually need from a substrate

Once you accept that the agent and the compiler do different jobs, a useful question becomes: what does the agent need from the substrate — the compiler, runtime, and semantics it writes against — in order to be productive?

Three things, mostly.

First, fast and structured feedback. An agent iterating on a kernel needs compile errors it can parse, deterministic failure modes, and reproducible builds. The faster and more legible the feedback loop, the fewer attempts the agent will burn through before it lands on something which works. A toolchain emitting a wall of cryptic template errors is, for an agent, roughly as useful as no toolchain at all.

Second, one mental model rather than N (one for each vendor/GPU target). Every fragmentation of the substrate fragments the training distribution underneath the agent and multiplies the surface area for hallucination. If the agent has to remember that intrinsics are named one thing on NVIDIA, another on AMD, a third on Intel, and that the memory model has subtly different guarantees on each — its effective competence drops on every one of them. The model that deeply knows one substrate beats the model that knows N substrates shallowly.

Third, an environment that doesn't lie. The agent's productivity is bounded by how quickly it can iterate against something whose behaviour matches its documentation. A toolchain with silent miscompiles, undocumented edge cases, or platform-specific behaviour only showing up at runtime is a productivity sink whether the developer is human or not. Arguably more so when the developer is an agent, because the agent has no intuition for "that smells wrong, let me check."

So, not so surprisingly, these three things are also pre-requisites for human developer productivity.

Volume changes the calculus

Here's the part that I think gets underweighted in conversations about agentic coding.

Human-authored GPU code is, in the grand scheme of things, a small corpus: tens of thousands of serious kernels, written mostly by specialists, reviewed mostly by other specialists. The substrate underneath those kernels can afford to be sharp-edged, because the people using it know where the edges are.

Agent-authored GPU code presents a fundamentally different volume of output. Potentially millions of kernels, increasingly directed by people who aren't GPU specialists and shouldn't have to be. The person building a vertical AI product is thinking about what their inference path should...

agent model code compiler different because

Related Articles