The Paradox of the Fast Engineer

Home

About

Contact

Dark Mode

Table of contents

The Paradox of the Fast Engineer METR's controlled study of sixteen experienced developers reported a 20% AI speedup; the stopwatch measured them 19% slower. The judgment that closes that gap is the deposit the agent now skips.

By Ruslan Tolkachev

TL;DR The judgment that lets an engineer override a model is built in the slow work the model now offers to do for them. Accept enough of that help on the work that would have built the judgment, and the agent’s speed arrives without the quality, security, scalability, maintainability, or operational sense that the slow work used to deposit alongside the code.

Three months after shipping, customers start complaining that menu items are missing from the navigation. The query that builds the menu does three LEFT JOINs against a self-referencing categories table. The agent produced that shape when the engineer described the requirement; review passed because the test fixtures were three levels deep. Production grew to seven. The query was silently truncating subcategories the day it shipped, and the engineer who accepted the output had never reached for a recursive CTE, because nobody on the team had ever shown them one. The fix is a recursive CTE with UNION ALL, anchored on the root row and joining the source table back to itself until no more rows come out. Five lines. Both shapes are valid SQL; the one that holds up against arbitrary depth is the one the engineer reaches for only after seeing it before. Without that prior, the idiom isn’t in their decision space. They can’t ask the agent for it, and they wouldn’t recognize it as the right answer if the agent offered it. No memory of a broken version that lacked it, no internal alarm that “three LEFT JOINs against a tree” is the shape of a future incident. The obvious fix isn’t the fix Review the agent’s code before approving it. True, and insufficient. The reviewer who has never written a tree walk over a self-referencing table doesn’t know what they should be looking for. They see SQL that compiles, returns rows on the test data, and matches the shape of the request. The internal alarm that says “this assumes a fixed depth, what happens when the tree is deeper than the joins” doesn’t come from reading SQL. It comes from writing the broken version yourself, watching it fail in production, and tracing the failure back through your own assumption. Code review without that prior pain is pattern matching against the surface of the query. The bugs that ship through review are the ones where the surface looks right. The paradox Here is the paradox. The judgment that lets an engineer override the model is built in the slow work the agent now offers to do for them. The engineer who accepts the output, reviews it briefly, and ships it has gotten the speed. They have not gotten the read on whether the query holds under the production tree shape, the security sense for whether the patch closed the CVE without invalidating something downstream, the scalability instinct for whether the join multiplies under real data, the maintainer’s eye for whether this diff just doubled the toil bill six months from now, or the operational feel for which parts of the system are load-bearing and which are decoration. None of those come bundled with the agent’s output. The five-minute version of the recursive CTE problem passes through them without depositing anything, the way watching someone debone a chicken on YouTube does not teach you when the knife is sharp enough. The pattern shows up in the public data. METR ran a controlled study in July 2025 on sixteen experienced open-source developers working in repositories averaging more than a million lines and a decade old. The developers self-reported a 20% speedup from AI assistance. Measured against the control, they were 19% slower. A forty-point gap between what the engineer feels and what the stopwatch records, on a population that does this work for a living. Google’s 2025 DORA report found 90% of developers using AI and over 80% reporting it made them more productive, while organizational delivery metrics stayed roughly flat for teams without strong measurement practices. The same report measured bugs per developer up 54% and the median time a pull request spends in review up 441%. The verification work the agent created moved to the reviewer. The reviewer is now the bottleneck the agent isn’t helping, and the skill that makes a reviewer fast (the recognition of which agent-generated PR is hiding a fixed-depth assumption, or a missing index, or a quietly invalidated invariant) is the skill the same reviewer is no longer building by writing the slow version themselves. Cloudflare’s Project Glasswing write-up lands on the same shape from the security side. When they let a security-focused model write...

The Paradox of the Fast Engineer

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits