Writing code versus shipping code: Productivity effects of AI coding tools

Writing code versus shipping code: Productivity effects across generations of AI coding tools | CEPR

Search the site

In 1987, Robert Solow famously quipped that "you can see the computer age everywhere but in the productivity statistics" (Solow 1987). Four decades later, the same concern animates the policy debate over generative AI. Experimental studies find that AI tools raise worker performance on specific tasks by 15–50% in customer support (Brynjolfsson et al. 2025), professional writing (Noy and Zhang 2023) and especially software development (Peng et al. 2023, Cui et al. 2026). Forecasts of AI's aggregate impact nonetheless vary widely, from percentage points of additional annual productivity growth to barely measurable effects (Acemoglu 2025, Jones 2026; see Filippucci et al. 2024 for an overview of this debate on Vox). Early firm-level evidence likewise suggests only modest effects so far (Aldasoro et al. 2026). A central reason for this divergence is a question that has been hard to test directly: do productivity gains on individual tasks translate into more final output? According to the 'weak links' or bottleneck hypothesis, they need not. If production consists of complementary stages and AI accelerates only some of them, final output remains limited by the weakest stage – the one humans still perform. This is the ‘O-ring’ logic of Kremer (1993), applied to AI by Aghion et al. (2019) and Jones (2026). In a recent paper (Demirer et al. 2026), we provide one of the first direct empirical tests of this hypothesis in software development, one of the earliest and most prominent domains of AI adoption. Three generations of AI coding tools Software development is an ideal setting for two reasons. First, it has already experienced three distinct generations of widely adopted AI tools: autocomplete (suggesting code as the developer types, available since 2021), sync agents (writing and editing code alongside the developer in real time, such as Claude Code), and async agents (working autonomously on an assigned task without developer oversight, such as GitHub's Coding Agent or OpenAI's Codex). Second, software production has a well-defined hierarchy of stages. Lines of code are bundled into commits, commits into pull requests (bundles of changes submitted for review and integration), pull requests into projects, and projects into shipped releases, so productivity can be measured at each stage of the chain. We combine the public GitHub histories of more than 100,000 developers with internal usage records from Microsoft, which owns GitHub. For some tools, we observe adoption directly in subscription data; for others, we identify it from publicly visible traces of usage on GitHub (Claude Code, for instance, leaves recognizable footprints in commit histories). To estimate productivity effects, we use a matched event-study design: each adopter is compared to a control developer with near-identical activity exactly one year earlier, which avoids comparing adopters with 'non-adopters' who may in fact be quietly using AI themselves. Placebo tests with non-AI tools, flat pre-trends, and agreement between our autocomplete estimate and the field-experimental estimate of Cui et al. (2026) for the same tool in the same period all support a causal interpretation (we report these checks in detail in our paper). Task-level gains grow with each generation Each generation of tools delivers larger productivity gains than the last. Measured by commits, a common measure of coding activity, adopting autocomplete raises a developer's output by roughly 40%. Adding sync agents takes the cumulative effect to roughly 140%, and adding async agents to roughly 180%. The gains are larger for less active developers but remain substantial across the entire activity distribution, and they grow over time in step with major model releases. Writing code versus shipping code These gains attenuate sharply at higher levels of the production hierarchy. Figure 1 summarises our estimates: combining all three generations of tools, output at the commit level roughly triples, and raw code volume rises by far more, but the same developers work on only about 50% more projects and ship only about 30% more releases. For sync agents alone, a more than sevenfold increase in lines of code becomes a 65% increase in pull requests, yet releases rise by only 20%. Figure 1 Productivity effects of AI coding tools across the production hierarchy

Notes: Matched event-study estimates of the cumulative effect of adopting AI coding tools on each layer of the software production hierarchy. Because more capable tools are adopted alongside earlier generations, the figure shows cumulative effects of adopting all tools up to and including a given generation. Source: Demirer et al. (2026). This attenuation is what a ‘weak links’ view of production predicts. In our model of software production, each stage's output is combined with human...

Writing code versus shipping code: Productivity effects of AI coding tools

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Britain Became as Poor as Mississippi