Getting agents to code less slop

2026-06-06

Coding agents have completely reshaped the way I work. I don’t think this is going to come as a surprise to anyone who’s been using these tools in the last 3-6 months. I now spend most of my engineering time building plans and reviewing agent code.

I don’t know that I’ll ever stop wanting to review agent code. I understand that probably makes me much slower compared to people who accept it as-is and ship to prod, but at the very least it seems like I’m not the only one1.

My rationale is twofold:

I echo the sentiments of those who talk about losing touch with code and its architecture. Despite my efforts to combat that, I still find myself asking “how did that thing work again?” Maybe it’s because I’m now covering a lot more breadth over a shorter period of time. Maybe I’m getting older. Maybe it’s both. I don’t remember finding myself in this position as often as I do now.

If I’m going to be on the hook for the code I ship, then I want to know what’s in it and I want to build it with posterity in mind. I’ve spent quite a lot of time learning what actually works in production: why would I throw that all away? If I can use that experience to shape outcomes, why wouldn’t I?

When I review agent code, I look for architecture or implementation problems. Architecture problems commonly arise from non-exhaustive planning where there was some gap in my understanding of the problem space that wasn’t filled by either my own research or the agent’s. Implementation problems commonly arise out of wrong architecture choices: solutions that don’t fit the framework, the language or even the structure of the existing codebase. These problems are not unique to agents.

Agents have introduced a new kind of implementation problem, one that arises out of the agent’s inherent stochasticity. Even when you get the architecture right, its taste in code structure and modularity is more informed by which branch of the decoding loop it took than some objective measure.

All of these scenarios produce slop. This post is about this new kind.

Partway through one of my review sessions, I asked myself a simple question: can the intuition I have about what makes code “clean” be automated? If it can be automated, can an agent use it?2

Static analysis has been in the engineering toolbelt for a very long time3 and tools4 that analyze programs are abundantly available and regularly used. One class of static analysis I haven’t seen used broadly5 is the kind of tool that tells you your code has too many conditionals, or that your functions are too big, or that you have an object with methods that don’t share any state. Essentially, the kind of tool that tells you how sloppy your code is.

I attribute this to two things:

Building consensus is hard6. Even more so when it’s about something subjective like taste.

There has, till recently, not been a generic tool that can automatically transform a body of code in a way that minimizes this kind of objective.

I7 wrote mdlr to give agents that objective, because

I now have half the time to review 2-3x the code

I need to be able to jump into any part of the diff or codebase and quickly get up to speed

I want one tool that I can use across multiple programming languages

Cleaning up dirty code is not my preferred method for staying sharp8

mdlr scans a codebase and outputs a list of metrics and their associated symbols sorted in descending severity. Here’s an example output.

$ mdlr check --pretty metric symbol value bucket function_size replay_endpoint_error::main 141 critical cognitive replay_endpoint_error::main 26 critical cyclomatic replay_endpoint_error::main 18 critical

Getting an agent to use this is very simple: ask it to run mdlr prompt and follow the instructions.

$ mdlr prompt

# Auto-Improve

Use mdlr to identify and improve modularity issues in the codebase.

## mdlr Reference

### Quick Start

# Analyze codebase (diff mode on branches, all files on main/master) mdlr check

# Force all files even when on a branch mdlr check -A

# Analyze specific directory or file mdlr check src/metrics mdlr check src/main.rs ...

Here’s an example that Claude was able to improve entirely on its own with this tool. This is part of a script I had it build to help me debug query timeouts in ClickHouse.

def main(): # ...argparse setup for --set, --delete, --gap, --explain, --output... args = parser.parse_args()

if not args.file.exists(): print(f"Error: {args.file} not found", file=sys.stderr) sys.exit(1)

error = json.loads(args.file.read_text()) endpoint_name = error.get("endpoint_name") if not endpoint_name: print("Error: no endpoint_name found in error file", file=sys.stderr) sys.exit(1)

# Extract request from request_json (toJSONString output) or request request = {} if "request_json" in error: request = json.loads(error["request_json"]) elif "request" in error: request = error["request"] if...

Getting agents to code less slop

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy