Agent Experience Stack

The AX stack: what's fixed, where you can win - Microsoft for Developers

Search Search

No results

Cancel

Waldek Mastykarz

Principal Developer Advocate

AI coding agents promise to make you more productive. On the surface they do, but in practice they fall short: agents generate code that doesn’t compile, use a deprecated SDK, or pick the wrong service entirely. Is it you using it wrong? Is it your tech stack? Or is it the tools you haven’t configured yet?

The stack between a developer’s prompt and the generated code has layers. Some of those layers are fixed: you can’t change them no matter what you do. But there’s one layer where you have all the leverage. And if you don’t know which is which, you’ll waste time optimizing the wrong thing without seeing any results.

This is the first article in a series about Agent Experience (AX): the practice of making AI coding agents work correctly with your technology. The series covers what you can and can’t control in the agent stack, how to measure whether your extensions are helping or hurting, and how to iterate toward better outcomes.

The stack

When a developer asks an AI coding agent to build something with your technology, here’s what actually happens: the developer sends an instruction to the agent. The agent sends it to the LLM along with information about the workspace and available tools (MCP, skills, etc.). The LLM processes the instruction and responds with instructions to the agent about which tools to call. This repeats until the LLM considers its job done or needs more information from you.

Developer prompt → Agent (harness) → Model → Agent extensions (skills, MCP servers, instructions, custom agents) → Your technology surface (CLI, SDK, API) → Generated code

Three layers matter for this conversation: the model, the harness, and the agent extensions. Each has a different owner, different constraints, and a different relationship to you.

The model

The model is a fixed constraint: you didn’t train it, you can’t retrain it, and you can’t control what’s in its weights.

If the model learned your API from outdated docs, it’ll generate code using deprecated patterns. If it never saw your technology during training, it’ll hallucinate something plausible and wrong. And if there’s a competing technology that has more training data, the model will default to it even when yours is the right fit.

You can’t fix this directly. What you can do is provide information at inference time that overrides or supplements what the model knows. That’s what agent extensions are for. But you need to understand: the model is the foundation, and its biases are the default behavior. Everything you build on top is fighting or reinforcing those defaults.

The harness

The harness is the agent itself: Copilot, Claude Code CLI, Cursor, Windsurf, whatever the developer is using. It controls the system prompt, the tool-calling protocol, and how the context gets assembled. It decides what gets included in the context window, what gets dropped, and what the agent does next.

You don’t control the harness either. You might build an extension that works perfectly in Copilot and breaks in Claude Code CLI because the two harnesses handle tool descriptions differently. The harness decides how the agent consumes your extensions, and that decision is opaque to you. As a result, the same MCP server, with the same tool descriptions, can produce completely different results across harnesses. The harnesses interpret and invoke them differently. Your extension isn’t running in a vacuum. It’s running inside someone else’s orchestration layer.

So when developers are telling that some model is better than another, they’re telling only half of the story. The same model will work differently in different harnesses, and you should consider them both when evaluating performance.

Agent extensions

Agent extensions are the surface you control. They’re everything you can put in front of the agent to shape its behavior: skills, MCP servers, instruction files, and custom agents. If you own a technology, you ship these to help agents use it correctly. If you’re a developer, you configure them in your workspace to get better results.

Agent extensions teach the model about your technology, correct its misconceptions, and steer it away from competing approaches. They’re how you get the model to do what you want instead of what it defaults to. They’re also how you inject up-to-date information that the model might not have learned during training. But agent extensions don’t exist in isolation: they compete.

The zero-sum context window

Every agent has a finite context window. Your MCP server’s tool descriptions, your instruction files, your skill definitions. They all consume tokens. And so do everyone else’s.

When a developer has 15 extensions installed and asks the agent to do something, the harness has to decide which tools to invoke, which context to include, and what to...

Agent Experience Stack

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan