How to Stop Babysitting AI Code

How to Stop Babysitting AI Code | Rohan GandhiLLM agents have made feature work feel almost cheap. You ask for a change, the agent writes code, adds tests and runs them. For a moment, the future looks like vibe coding.Then the receipts arrive.The agent imported from the nearest useful path and gave frontend code direct access to the database. That choice made the feature easier to implement, but it let short-term convenience rewrite the architecture.The obvious next move is to write the rule down. AGENTS.md files, specs and planning documents all try to load architecture into the agent before it writes code.They help, but prose is still a reminder, not a constraint. The agent can forget it, obey the part that is convenient or claim success without satisfying the rule.Review became the backstop, which meant checking every diff for shortcuts like that. It helped, but it turned development into a tedious supervision job.The way out was to separate architectural decisions from implementation details, then add checks that force the implementation to respect those decisions. Once that was in place, review shifted away from generated plumbing and back toward long-lived architectural choices like module boundaries, public APIs and dependencies.Rules Need a Place to Live The deeper problem is memory. Humans carry social context forward. Agents need that context reloaded or enforced every run. They can break the same written rule ten times and still treat the eleventh reminder as new information. It’s like hiring a very fast intern who resets every morning.Documentation is useful context, but a poor home for rules the project cannot afford to break. Context is scarce. Instructions dilute each other. If a rule must hold every time, it should live somewhere the codebase can enforce it.The Cost Curve Flipped Software teams already use checks. Important rules leave prose behind and become types, tests, linters, build checks or runtime validation. Teams usually promoted a rule only when violations were common or enforcement was important enough to justify the work. Even then, they reached for standard lint presets and checks that were cheap to configure.A custom checker that understands your module layout, your import boundaries, your idea of what a public surface looks like would require significant engineering effort upfront for questionable reward. Standard lint presets and code reviews as a quality backstop were often the better tradeoff.Coding agents changed the economics in both directions.They generate drift faster than humans can review it.They make bespoke checkers cheap enough to prototype in an afternoon.The source of the drift now helps contain it.Here is a small example. Agents love to wrap simple arguments in ceremony.Instead of:load(id: string)you get a tiny application form like:load(input: { id: string })Telling the agent “don’t do that” in an instructions file works for a while. Asking the agent to write a check that rejects that pattern in public interfaces took me minutes. From then on, the check handled the reminder and the build failed until the agent fixed it.The pattern easily scales to rules that actually matter. Frontend must not import the database schema? That’s an import boundary check. Don’t want your validation logic to live in UI? That’s a source check. Once a rule is a check, the failure is deterministic, the error message names the rule and the agent repairs its own violation before I ever see the diff.Contracts Outside, Generated Code Inside In my project, local rules become checks in the build system. They live outside the package source and are grouped by the code they govern, such as UI primitives, UI components and domain packages.For this example, let’s look at the domain package. It is mostly plain TypeScript code that carries the business logic for my project. The logic is split into small modules with narrow public surfaces. Each module follows an enforced structure.some-module/Source files split by review responsibility. Authoritativereview closely manifest.json module metadataREADME.md documented intentcontract/interface.ts public interfacecreate.ts constructor shapecases.ts example use cases Generatedregenerate when authoritative files change impl/ implementation... agent-written code

Review starts with the authoritative files. Generated code stays inside impl/**.The structure is one rule among several. A module must contain the expected files and each file must obey the rules for its role.The architectural decisions live in the authoritative files. manifest.json, README.md and contract/** define the module’s intent, public surface, dependencies and behavior examples.The implementation details stay inside impl/**. They must satisfy the contract and stay inside the import boundaries.Changes to authoritative files can change the project’s shape. They also require impl/** to be regenerated. Changes inside impl/** should stay local to the module. If they escape that boundary, a...

How to Stop Babysitting AI Code

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews