How AI coding agents actually use your technology - Microsoft for Developers
Skip to main content
Search<br>Search
No results
Cancel
Waldek Mastykarz
Principal Developer Advocate
You ship an SDK, a CLI, an API, and developers use it. Now AI coding agents use it too, except they use it differently than humans do. Most of the time you have no idea what’s actually happening between "developer types a prompt" and "agent generates code with your technology." Is the agent reading your docs? Is it calling your MCP server? Is it ignoring both and guessing from memory?
In the previous article, we introduced the AX stack: model, harness, and agent extensions. We talked about what’s fixed and what you can influence. This time, let’s trace through what actually happens, step by step, when an agent encounters your technology. Because until you see the mechanics, you can’t fix what’s breaking.
What happens when a developer says "build me something"
A developer opens their coding agent, types a prompt: "Build me a REST API with authentication using Contoso Identity." Here’s what happens next.
Step 1: The harness assembles context
Before anything hits the model, the harness (Copilot, Claude Code, Cursor) assembles the context window. The VS Code team recently published a deep dive into how their harness works, covering context assembly, tool exposure, and the agent loop. The harness pulls together:
the system prompt (harness-specific, you can’t see or change it)
environment details: the developer’s OS, the full path to the working directory
workspace files the harness thinks are relevant
tool descriptions from installed extensions (MCP servers, skills, custom agents)
conversation history
any instruction files (.github/copilot-instructions.md, AGENTS.md, etc.)
the developer’s prompt
This is just an example, because every harness is different. Nonetheless, if you consider the context window size of any of the popular LLMs used for coding, you can start to see how such a setup quickly fills up the available tokens. The harness decides what makes the cut. If the developer has 20 extensions installed, the harness might summarize tool descriptions, drop some entirely, or rank them by estimated relevance. Your extension’s description is competing for space before the model even sees it. If it exceeds the harness’s length limit (each harness sets its own), it gets ignored entirely, no matter how relevant it is. And details you’d never think about, like the OS, or the directory path, influence the model’s decisions. It’ll generate platform-specific code, assume different toolchains, even pick different default configurations based on what it sees here.
Step 2: The model reads the room
The model receives this assembled context and does something humans don’t: it reads everything at once. The system prompt, the tool descriptions, the workspace context, the developer’s prompt. It builds a mental model of what’s available and what the developer asked for.
Here’s where training data matters. If the model has seen your technology during pre-training, it already has opinions. It knows (or thinks it knows) your API patterns, your SDK conventions, your common error messages. If it hasn’t seen your technology, it has nothing, and it’ll either ask for help or guess based on similar technologies. Either way, the model’s job at this point is to decide what to do first: does it have enough information to start coding, or does it need to call a tool?
It turns out, that this decision is a combination of the behavior encoded in the model and the instructions the harness adds on top. Some agents are more inclined to call tools straight away, while others rely on their own knowledge and only use tools when the user asks them to. Some agents tend to search for the latest information on the internet first, while others prioritize efficiency and start working on the task if they feel they know enough. So even if you ship a great extension, one agent might call it proactively while another never touches it unless the developer explicitly asks.
Step 3: Tool selection (or not)
If the model decides it needs more information, it looks at the available tools and skills. This is where your MCP server’s tool descriptions and skill definitions matter. The model reads each description and decides: does this help with what the developer asked? Notice, that this decision is based on semantic matching, not keyword search. The model is interpreting intent. If the developer said "authentication" and your tool is described as "configure identity provider settings," the model has to bridge that gap.
And even when your description matches the intent perfectly, the model might still skip it. If the task looks simple enough, or if the model feels confident it already knows the answer, it won’t bother calling your tool. It’ll just go with what it has. This is especially painful when the model has some training data for your technology but it’s...