Garbage in the Loop

Brian Kihoon Lee

Essays

Garbage in the Loop

2026-06-24

Tagged: llms

Since Opus 4.5 came out last November, my daily driving experience with coding agents feels like it’s stagnated. It’s not that AI has plateaued; it continues to accomplish feats that were not even in my realm of possibilities a year ago: codebases rewrites pinned to extensive unit tests, mathematical theorem proving, security exploit discovery, autoresearch, etc.. I am starting to believe that I am the bottleneck, not because I have to sit there and prompt the agent, but because the agent needs to untangle my sloppy prompting.

In this essay, I consider where we should focus our efforts now that frontier models are no longer the dominant source of garbage.

A garbage loop example

“Garbage in the loop” is a fusion of two phrases: “garbage in, garbage out”, and the agentic loop.

A typical garbage loop (early-2026) might go as follows:

I ask the agent: “I’m seeing bug X on system Y. It’s coming from the Z module, go investigate and fix.”

Agent greps the monorepo for Z and comes up with 4 search results, only one of which is in system Y. But System Y’s name isn’t part of the filepath, so the agent doesn’t know which one is correct. Plausibly, all of the results might be different parts of system Y.

It wastes some time digging further before finding the right module Z.

Along the way, the agent has learned some plausible concepts about system Y’s doppelgangers that don’t actually exist in system Y.

It misdiagnoses the error, hallucinating some chain of causality that would make sense if those other concepts actually existed – but they don’t.

It invents a fix, tests it, and declares the task complete.

When I tell it that I’m still seeing the error, it starts inventing new epicycles upon epicycles to explain away the discrepancy.

I give up, clearing the session and reprompting with the right filepath.

The original garbage was my insufficiently precise reference to system Y, and the agentic loop failed to recover.

Here are all the possible intervention points

Don’t even mention Y or Z; instead just reference the URL of the page you saw the error on. Alternately, name the exact filepath to system Y, or change directories into system Y so that the model’s search returns the right results.

The harness could have indicated something about me and the systems I usually work on, to help the agent disambiguate.

The environment could be cleaned up so that systems are better documented - either self-documented through better naming, or with AGENTS.md indicating common confusion.

Frontier labs train smarter models that, at higher reasoning levels, spend more time digging into and figuring out the relationship (or non-relationship) of the search results, and to be more thorough in testing its hypothesis.

Division of Responsibilities

User, harness, environment, and model. Whose responsibility is it to fix these garbage loops?

One extreme take is that as models get smarter, they’ll just figure it out themselves, given sufficient tokens and a memory device of some sort. Any attempt to short-circuit this process would be a violation of The Bitter Lesson, which warns us that building knowledge into our agents is effective in the short run and actively harmful in the long run.

The antipodal stance (call it the Babbage take) is that models should restrain themselves to doing exactly what they’re asked, and that it’s the user’s responsibility to input their prompt exactly as they want it done.

The theory of risk compensation says that we might take agents right up to the very precipice of usefulness. Thus, we end up with a very rough equation:

User prompt quality + Harness quality + environment quality + model quality = Agent quality = Task scope

That’s the equation if we assume all four parties are working towards improving their game. Otherwise, we might have something closer to the following:

Harness quality + model quality = Agent quality = Task scope + tolerance to bad user prompt + tolerance to bad environment

You won’t be able to tackle more ambitious tasks if the user slurps up all improvements in agent quality by being worse at prompting, or if the codebase/information environment fills up with slop.

Garbage ontology

Stepping back, here’s a more complete list of garbage sources:

User

I told the agent to use a wrong approach (forgot about a constraint, assumed a fix would be simple but it wasn’t, picked a bad solution).

I prompted in the wrong environment (the wrong agent conversation, the wrong SSH terminal, the wrong git branch/worktree/checkout.)

I gave an underspecified prompt which could quite reasonably be interpreted differently.

I maintained one very long multi-topic session, which contained instructions that were only correct with respect to previous topics.

Unprocessed prompts are passed in (e.g. raw user feedback instead of curated JIRA tickets).

Harness

System prompt was fixing...

Garbage in the Loop

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level