Post-Mortems for Agent Runs

Post-Mortems for Agent Runs. The agent burned five hours on a… | by Ian Johnson | Jun, 2026 | MediumSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

Ian Johnson

6 min read· Just now

Listen

The agent burned five hours on a refactor that should have taken one. The first hour was fine. The second, it was rewriting a module nobody had asked it to touch. The third through fifth went to rolling back, re-planning, and producing the smaller diff we should have ended up with at the start. The work landed late. The team was annoyed. The lesson was sitting there waiting to be picked up. We did not pick it up. The work shipped, the agent moved on, and the same failure mode showed up two weeks later in a different module. We paid for the first incident twice because we had treated it as an annoyance instead of a learning moment. The fix was obvious in retrospect. Agent failures want post-mortems, the same way human incidents do. The practice does not transfer automatically, and most teams have not built the habit. What the post-mortem is actually for The point of a post-mortem is not to assign blame. With agents, that is even more true. There is no person to blame, and pretending the agent is a person leads to bad analysis. The agent did what the agent does. The question is where the team’s setup failed to catch it. The setup is a stack of layers: the task description, the harness rules, the hooks, the tests, the code review, and the human supervising. A failure is a place where one of those layers should have caught the problem and did not. The post-mortem’s job is to find which layer missed and patch it. The deliverable is a small set of changes — a rule added or modified, a hook tightened, a feature doc written, a task-writing practice updated. Not a list of “things to remember.” A list of changes to the system. When to run one Not every failure deserves a full post-mortem. The cost is real and the bar should be too. I run one when the failure cost at least an hour of redo, or when the same failure type has shown up twice in two weeks. The first condition catches the expensive single incidents. The second catches the cheap repeats that compound into expensive patterns. The other trigger: a near-miss that would have been a real incident if a particular reviewer had not caught it. Near-misses are the most underused signal in agent work. The harness was almost not enough. The reviewer was the last line of defense. The next failure of the same kind might not have that reviewer in the room. A near-miss is a free post-mortem. The cost has not been paid yet. The lesson is available for the price of the analysis. Five sections, one page The shape that works for me is short and rigid. What happened. Three sentences, no commentary. The agent was asked to add a validation step to the user-import flow. It produced a 600-line PR that rewrote the entire flow. The PR was caught in review and rolled back. What the agent was working from. The task description. The harness rules that fired. The context it had loaded. If the failure came from a missing rule, the absence is the finding. If it came from a misread task, the description is the finding. Where the layers failed. Walk down the stack. For each layer, ask: was this failure mode in scope, and if so, why did the layer not catch it? This is the load-bearing section. The answers are the changes. What changes. One or two specific edits. A rule added. A hook tightened. A feature doc scoped to a path. A change to how tickets are written for that kind of work. Small enough to land the same week as the post-mortem. What we are not doing. This is the part teams skip and then regret. A post-mortem produces a temptation to add three rules, two hooks, and a process. Most of them will be wrong. Name the ones you considered and rejected, with the reason. The next post-mortem on a similar failure can revisit them, with evidence. The whole document is one page. If it grows past a page, the failure is being over-analyzed and the changes are being padded. Where most teams miss the lesson The pattern I see in teams that adopt agents but do not improve over time: failures get framed as the agent’s failures. The response is “the agent is not ready for that,” and the rope gets pulled back without a corresponding investment in the harness. Get Ian Johnson’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

The next failure of the same shape happens to the next contributor, who has no record of the first one. They make the same trust adjustment, alone. The team’s collective knowledge of where the agent is reliable stays the same. The harness does not improve because nobody wrote down what should change. The fix is the post-mortem and the patching. The failures are the agent’s, in a literal sense. They are also the team’s, in the sense that the...

Post-Mortems for Agent Runs

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs