Agents Make Engineering Hard Again

Agents Make Engineering Hard Again - ninjapenguin

Intro

I think we’re nearing the end of the “prompt demo” phase of AI. The door is now firmly ajar on the engineering phase, and the exciting news for engineers is that these shiny new agentic systems are bringing all the hard bits back again.[1]

The Great Prompt Illusion

There’s been a lot of focus on the prompt as the product up until this point. It makes sense, the barrier is low, the wow factor high: the worm on that hook is juicy. The next step on the journey to getting that prompt into production . . cavernous

That prompt looks so nicely defined sitting all alone in its well styled textarea, but in the real world? It needs a runtime, it needs model access, tools, state, orchestration, continued evaluation (testing), deploying, monitoring.[2] The list goes on and, its all very predictable from an engineering standpoint.

An Agent? It takes a village (of engineers)

And now Agents. Now that single prompt has autonomy! An Agent requires state (actually it likely requires a LOT of it). It has dependencies, credentials (probably a lot of these as well - looking at you MCP), significant side effects, multiple users, costs… the list goes on.[3] They’re powerful and have the potential for real, impactful, business consequences but they’re almost a perfectly designed thought experiment aimed at stressing all the assumptions of modern day infrastructure. So this brand new, shiny technology . . .means we now need to go back to ensuring we’re doing all those fundamentals again.

Return of the Stateful Process

Engineering for scale is hard, coupled with the fact that the bar for “scale” keeps moving. In fact, we keep inventing new terms just to try to get a handle on it: enterprise scale became web scale, web scale became hyperscale, hyperscale became planet scale…

We’ve worked hard and learned a lot. Small, stateless, short-lived, horizontally scalable, easily replaceable processes are where we’ve been aiming.[4] Containers, Kubernetes, ‘serverless’, edge compute are infrastructural tools and approaches we’ve been using to get there.

Agents, significantly, complicate this.

They are often long-running processes with large amounts of working context, mutable workspaces, local codebases, browser sessions, authenticated tools, and an ever-growing backlog of memory.[5] They communicate over long-running streams. They accumulate state. They make decisions based on context that often don’t fit neatly into a request-response lifecycle.[6]

That is to say: almost the exact opposite of what we have been aiming for.

So what’s the problem?

The problem is not that long-running, stateful systems are impossible. We have been building them forever (we used to mostly default to building them!). The problem is that agents make stateful systems feel easy to create and hard to operate.

With their very particular set of skills flavour of requirements Agents in production require that we ask a lot of those familiar questions again: What if this long process gets halfway through and dies? Can it safely resume? Can we adequately explain why it made a specific decision/tool call? How do we consider cost control? Can its credentials be revoked quickly? Can we accurately discern the difference between a model error, a tool error, a permissions issue, a bad prompt, stale context or plain vendor outage?

The key here is that none of these are AI-specific problems, but then again Agents aren’t remotely interested in quacking like the ducks our infrastructure has been evolving to handle.

And the new problems ARE new

As well as making us now reconsider those old familiar problems again Agents also present a number of genuinely new ones. Non-deterministic execution, prompt injection, tool misuse/call failures, hallucinated actions, highly variable latency, wildly unpredictable costs[6], evaluation difficulty.[7]

On the face of it, some of these challenges seem diametrically opposed to many of the objectives we’ve been striving toward for years.

“What’s the expected runtime of this new system?” “Somewhere between 2 and 20 minutes”

“What does the output look like?” “Ah well that’s, technically, random but most of the time it should hopefully be…”

“What’s the expected operating cost?” “We have a figure, but the potential upper bound is 30x.”[6]

…that would traditionally make for a reasonably rocky start to any production readiness review.

The fundamentals matter so much more now

Security, observability, testing, governance and, my personal favourite, resilience now, more than ever, must become first-class product features for these systems.[8][9][10]

With these systems acting autonomously on behalf of real-world users or businesses, we must understand what they’re doing, why they’re doing it, what has been affected and, critically, how to stop and/or recover it. The exciting bit? There’s some real engineering complexity in solving the approach to many of these.

The production...

Agents Make Engineering Hard Again

Related Articles

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI