The optimal number of unreviewed PRs is not zero

gheartt1 pts0 comments

The optimal number of unreviewed PRs is not zero

The optimal number of unreviewed PRs is not zero

Code got cheap. Review didn’t. What queueing theory says about the AI-era backlog.

John Begeman · June 2026

“The dominant paradigm for managing product development is fundamentally wrong. Not just a little wrong, but wrong to its very core.”

Every engineering lead I talk to has the same complaint right now: AI made writing code fast, and now code review is the bottleneck. PRs pile up, reviewers drown, and the usual response is some mix of “we need more reviewers” and “we need AI to review the AI’s code.”

I went back to Don Reinertsen’s The Principles of Product Development FlowDonald G. Reinertsen, The Principles of Product Development Flow: Second Generation Lean Product Development (Celeritas Publishing, 2009). Dense, unglamorous, and more useful per page than anything else written about how engineering organizations actually move. with this problem, and his framework rejects the framing before it does anything else. You don’t have a code review problem. You have a queue, and you’ve stopped looking at it.

The queue you can’t see

Reinertsen’s central argument is that development organizations obsess over what they can see (output, utilization, busy people) while ignoring the most expensive thing in the system: work sitting in queues. An unreviewed PR is inventory. It ages, goes stale against main, and blocks whatever is behind it. Most teams can quote their deploy frequency to two decimals and have no idea what their median PR wait time is, or whether it’s growing.Isn’t this just DORA? Not quite. DORA’s lead time for changes runs commit-to-production, which contains the review queue but doesn’t isolate it — a fine end-to-end alarm, useless for locating the bottleneck. The numbers you want are pickup time (open to first review), queue depth, and queue age. Deploy frequency, meanwhile, is a DORA metric, which is exactly why everyone can quote it. I didn’t, until I checked.

Step one is boring. Instrument the queue: depth, age, arrival rate against service rate.Little’s Law ties these together: L = λW — the average number of items in a system equals the arrival rate times the average time each item spends there. It holds for any stable queue, no assumptions about distributions required. If PRs arrive at 9 a day and sit for 2 days, you carry 18 open PRs. Always. Until then you’re managing the wrong variable.

Why it got bad so fast

Wait time explodes nonlinearly as utilization approaches 100%. That’s not a management opinion, it’s queueing theory, and it means a reviewer whose review capacity is 95% spoken for is not your most efficient reviewer.Utilization here means demand against review capacity — the handful of hours a week genuinely available for reviewing after someone’s own work — not 95% of their waking hours. Nobody reviews nine hours a day. That’s part of the problem: the queue is served by a far smaller pipe than the org chart suggests, so it saturates faster than anyone expects. They’re the reason the queue is three days deep.

Figure 1. Time a PR spends in the system, in multiples of the bare review time, as reviewer utilization ρ rises. For a single-server queue with random arrivals, W = W₀ / (1 − ρ). The curve is flat for a long time, and then it isn’t. Everything interesting about your backlog happens to the right of 85%.

25%<br>50%<br>75%<br>100%

reviewer utilization ρ

1×<br>5×<br>10×<br>20×

time in system ÷ review time

ρ = 70%: waits ≈ 3×

ρ = 95%: waits ≈ 20×<br>your &ldquo;most efficient&rdquo; reviewer

Now add AI. The tools cranked the arrival rate into a stage whose capacity is fixed and was probably already running hot. Nobody removed a bottleneck here. We relocated one and made it worse, which is what speeding up one stage in isolation always does: the inventory just lands at the next handoff.

Little&rsquo;s Law makes the damage concrete, and the numbers are worth staring at. Take a team whose reviewers can clear ten PRs a day. Before AI they received seven a day — 70% utilization, PRs waited about a third of a day, two or three open at any time. Fine. Now the same team receives nine and a half a day. Arrivals went up 36%. The queue did not go up 36%.

Figure 2. The same review team before and after AI, capacity fixed at 10 reviews/day. Arrivals rise from 7 to 9.5 per day (&times;1.36). Time-in-queue rises from a third of a day to two days (&times;6); by L = &lambda;W, PRs waiting rise from about 2 to 19 (&times;8). A modest change in input, an order of magnitude in the queue. This is why it felt sudden.

before AI<br>after AI

arrival rate ×1.36

time in queue ×6

PRs in queue ×8<br>(2.3 → 19 open)

1× baseline

The queue feeds itself

The static picture is bad enough, but a hot queue doesn&rsquo;t stay static. A...

rsquo queue time review utilization code

Related Articles