How to break your AI agent

How to Break Your AI Agent: 6 Ways They Fail in Use

Digital Thoughts

SubscribeSign in

How to Break Your AI Agent (Basics) The quiet ways the agent you already run falls apart in use, and how to make fewer of them.

Pawel Jozefiak Jun 24, 2026

I’ve been building my own AI agent for months now. Claude Code, Codex, a pile of other tools wired together into one thing that runs my day. A while back I wrote about how to build your first one, and that post covered a few of the mistakes I made while putting it together. This one is different. It is about breaking the agent you already have. The one you use every day. The one running in the background while you sleep, the one you trust to actually do things. I also wrote about the time I almost fried my agent and my Mac Mini, but those were my specific accidents. This is the general version. The field guide. The list of ways any agent goes sideways once it is in your hands, so you can make fewer of these mistakes, or at least see them coming. And here is the part people skip past. This is not only about custom agents like mine. The same things break ChatGPT, Claude, a Zapier flow, whatever framework you picked off the shelf. AI moves fast. Speed does not make a thing unbreakable. You can break anything you build the moment you start using it, and an agent is no exception.

First, the reason agents break at all

It is not that the model is dumb. It is math. An agent does work in steps. Read this, call that tool, decide, act, check, repeat. Steps multiply, they do not average. If each step works 99% of the time, ten steps in a row work about 90% of the time. A hundred steps, around 37%. A thousand steps, basically never. Researchers measured this directly, and it gets worse, because the errors are not independent. One wrong step nudges the next one wrong too. So every time you add surface to your agent, more tools, more memory, more steps, more things it can touch, you are not adding risk in a straight line. You are multiplying it. Keep that in your head. It quietly explains every item below.

1. You overbuild it

This one is mine. I have a real problem with overbuilding.

Building my own AI agent in the open, breaks and all. One honest post a week.

When I want something, I build it. Then I want another thing, so I build that too. Each piece works on its own. The trouble is the pile. Building toward your own needs feels right in the moment, because it is natural. You need this, you need that, so you keep adding. I learned the cost of it too late. More pieces means a higher chance that something, somewhere, is broken at any given moment. That is just probability. And past a point the tools start working against the agent instead of for it. There is a name for this now. Microsoft researchers call it tool-space interference. Give a model too many tools and it picks the wrong one, burns tokens, or invents a call that does not exist. Studies have measured the drop in tool-selection accuracy anywhere from 7 to 85% as the catalog grows, and it gets worse for tools sitting in the middle of a long list, the same lost-in-the-middle effect that hits long context. OpenAI limits a single request to 128 tools, and coding agents like Cursor warn that quality slips well before you stack even a few dozen. Either way, the cap is not where the trouble starts. Every tool you add competes for the model’s attention long before you hit any limit. Context has the same shape of problem. There is solid research on context rot now: across 18 frontier models, accuracy fell 30 to 50% as more was stuffed into the window, well before the window was even full. On the million-token models it started showing up around 300 to 400 thousand tokens. So an agent buried in its own accumulated context gets measurably worse, not because it ran out of room, but because the room got noisy. The fix is not glamorous. Prune. Do a periodic checkup on your skills, your tools, your memory, your core files. If your error registry is lit up red, the agent is already telling you something is broken and you stopped reading it. I keep my CLAUDE.md tight partly for this, to keep the core readable instead of letting it bloat into something nobody can hold in their head. There is a structural fix too, and it is the one I would push hardest. Stop showing the agent everything at once. The pattern that holds up in production is search-then-load: the agent keeps a small index of what it can do, looks up the few tools it needs for the task in front of it, and loads only those. Same idea for context. Treat the window like a budget you spend on purpose and compact it often, rather than letting months of history pile up and rot. The agent that carries less is the agent that stays sharp.

2. You make one agent do everything

My first idea of an agent was Jarvis. One mind that handles all of it. I owned up to this in the first-agent post too. It is the romantic version everyone starts with. For strictly...

How to break your AI agent

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Britain Became as Poor as Mississippi