Golden Paths Weren't Built for Agents

Your Golden Paths Weren't Built for Agents - Part 1: How Agentic Development Breaks Control | Massdriver<br>📽️Kelsey Hightower on Platforms in the Agentic Era and the Launch of Massdriver v2<br>Watch On-Demand→

Engineering

Everyone's talking about the blast radius of agents. Put them in a container. Put them in a sandbox. Scope their credentials. Give them their own cloud account so they can't reach anything that matters.

It's good advice. We give a version of it ourselves. "Put them in a sandbox" … it's a good start.

But containing where an agent can act and governing what it's allowed to do are two different problems, and the second one is where you see agentic development start to break developer self-service and infrastructure orchestration. You can put an agent in the most isolated cloud account in the world. The moment it needs to do real work inside that account, it runs into a permission model that was broken years before anyone said the word "agentic."

This is the first in a short series on what agentic development breaks in DevOps. I want to start with control, because it's the one most teams are sure they've already handled.

Self-service was never self-service

To manage a resource, you have to be granted access to that resource. For that to happen, the resource has to exist. For it to exist, someone with more privileges than you had to create it first. So either you hand developers the ability to create resources freely and you've lost any real control, or you gate creation behind an operator and call whatever's left "self-service."

It was never self-service. Permission models built around the identity of a resource undermine self-service by definition. It's resource-provenance hell or ticket ops.

Take a common day-two task that's inaccessible to most "self-service" pipelines. A developer owns a Postgres database in production. Postgres 14 is going EOL. The upgrade for a major version isn't a setting in a dropdown; you stand up a new instance on the new version, cut over, migrate the data, and decommission the old one. You build it, you run it. That's the promise.

Except to stand up that new instance, the developer needs permission to create a database. Not manage the one they already own, the new one. The one that doesn't exist yet. Of the big three, only AWS can really gate this by attribute, and only if every resource is tagged perfectly the moment it's born and you've locked down who can change tags; GCP and Azure can't condition the create on a tag the resource doesn't have yet, so you fall back to a project- or subscription-level grant. The grant model assumes the resource comes before the permission, and a day-two task like a database migration inverts that order every time.

So the developer files a ticket. An operator with elevated credentials creates the instance. The operator grants access. The developer finishes the migration they were perfectly capable of running themselves. You built a self-service platform and the most routine day-two operation in existence still detours through a human.

Kubernetes has the same problem. A RoleBinding ties a subject to a Role inside a namespace, which is fine until the workflow needs to create the namespace. The RoleBinding can't reference a namespace that doesn't exist yet, same as the IAM policy can't reference the database that doesn't exist yet. So you either hand out cluster-scoped create permissions, pre-provision every namespace anyone might need, or file a ticket. Same root cause: the permission is pinned to something that has to exist before the permission can mean anything.

We've just never felt how broken it is

Every platform engineer reading this has lived some version of this and shrugged, because it's always been survivable.

It's survivable because humans are slow. The ticket-to-ops detour for a Postgres major version stings, but you do it maybe twice a year. The orphaned databases nobody can account for pile up over years, slowly enough that you keep telling yourself you'll clean them up next quarter. The permission model leaks the whole time. The leak is just slow enough to mop up the tedious churn on IAM definitions.

Agents are not slow.

When developers are shipping 10x faster, the cloud underneath them changes 10x faster too. Every migration, every version bump, every "I finished the roadmap so I'm finally going to upgrade Postgres" task that used to sit in a backlog for two years starts coming down the pipeline. And a lot of that deferred work is exactly the create-migrate-destroy churn that the identity-based model handles worst. "We're breaking up the monolith, we need to create 30 new queues and mysql instances!"

A permission model pinned to specific resource IDs doesn't slow this down. It breaks, because the resources it's pinned to are being created, migrated, and recreated constantly, and every recreation orphans the grant that referenced the old ID.

The unit was wrong the whole time

The fix is not a bigger...

Golden Paths Weren't Built for Agents

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI