Code Review Is Not About Catching Bugs
My former Parse colleague Charity Majors – now CTO of Honeycomb and one of the strongest voices in the observability space – recently posted something that caught my attention. She’s frustrated with the discourse around AI-generated code shifting the bottleneck to code review, and she argues the real burden is validation – production observability. You don’t know if code works until it’s running with enough instrumentation to see what it’s actually doing.
She followed up by endorsing Boris Tane’s piece arguing that the entire SDLC is collapsing, with monitoring as the only stage that survives. Boris goes further than Charity – he argues the pull request flow is a relic, that human code review should become exception-based, and that clinging to it is “an identity crisis.”
She’s right about that. And I’d go further: if your primary strategy for knowing whether code works is having other humans read it, you have bigger problems than AI-generated code.
But I think the “code review is the bottleneck” crowd and the “no, validation is the bottleneck” crowd are both working from the same flawed premise: that code review exists primarily to answer “does this code work?”
To be fair, finding defects has always been listed as a goal of code review – Wikipedia will tell you as much. And sure, reviewers do catch bugs. But I think that framing dramatically overstates the bug-catching role and understates everything else code review does. If your review process is primarily a bug-finding mechanism, you’re leaving most of the value on the table.
The Question Code Review Actually Answers
Code review answers: “Should this be part of my product?”
That’s a judgment call, and it’s a fundamentally different question than “does it work.” Does this approach fit our architecture? Does it introduce complexity we’ll regret in six months? Are we building toward the product we intend, or accumulating decisions that pull us sideways? Does this abstraction earn its keep, or are we over-engineering for a future that may never arrive? Does this feel right – not just functionally correct, but does it reflect the taste and standards we want our product to embody?
Tests answer “does the code do what the author intended.” Production observability answers “what is the system actually doing.” Code review answers “was the author’s intent the right thing to build?”
You need all three. None of them substitutes for the others.
Observability gives you incredible power to understand a running system. But by the time you’re observing something in production, the decision to build it that way has already been made. You’re watching the consequences of choices – you’re not influencing the choices themselves. That’s what code review is for.
What I Learned Watching This Play Out
At Firebase, I spent 5.5 years running an API council that reviewed somewhere around 850 proposals across every language, ecosystem, REST API, database schema, and CLI command that Firebase shipped. That was technically API review, not code review – a separate process focused specifically on the interfaces we exposed to developers. But the underlying muscle was the same: applying judgment about what should exist in the product.
The most valuable feedback from that council was never “you have a bug in this spec.” It was “this API implies a mental model that contradicts what you shipped last quarter” or “this deprecation strategy will cost more trust than the improvement is worth” or simply “a developer encountering this for the first time won’t understand what it does.” Those are judgment calls about whether something should be part of the product – the same fundamental question that code review answers at a different altitude. No amount of production observability surfaces them, because the system can work perfectly and still be the wrong thing to have built.
I saw this earlier at Parse too, where API review happened right inside the pull request – a tiny team making decisions that hundreds of thousands of apps would depend on, and every PR was a chance to ask “are we making a promise to developers that we actually want to keep?” Later at Google Cloud, API review lived in code review there as well. The underlying question was always the same: does this belong in our product?
And the tools we have for this are expanding fast. The kinds of things we can review in an automated way – style consistency, security patterns, API compatibility, performance regressions – are changing dramatically. Which means the interesting question isn’t just “what does review need to become?” but “what should humans focus on in review now that tooling can handle more of the mechanical checks?” The answer, I think, is the stuff that’s hardest to automate: judgment, taste, architectural coherence, and the collaborative aspects of building shared understanding.
Charity’s right that you need observability. Honeycomb has built incredible tools for exactly...