Talk is Cheap - by Jake - Sovereign Games
SubscribeSign in
Talk is Cheap<br>What the data says about the operational impacts of LLM use in the software industry
Jake<br>May 31, 2026
Share
This is a continuation of the discussion in How I’m thinking about the value of LLMs. I’m arguing elsewhere that LLMs will never be geniuses. This is not part 2 of The Ontology Argument.<br>In How I’m thinking I said I wasn’t ready to take a stance on LLM value creation. That changes in this post. Here is the stance I’m taking:
On average, how we’re using LLMs is likely destroying value.
My stance originates from stumbling on Faros.ai - a software development telemetry firm. They have products that pipe into common development tools like Jira, Github, and CI/CD pipelines to directly measure major operational metrics for software development teams.<br>Faros published a report in March that directly compares transaction level data between teams using AI in their software development process vs those that are not across their customer base. 22,000 developers, 4000 teams in the sample. This is, by far, the best data I’ve been able to locate that directly measures the operational impact of use of LLMs in the software development process.<br>It’s bad. Really, really bad.<br>The Faros Data
The whole report is worth a read, but I’m going to cover just three major headline conclusions.<br>First - developer level productivity has improved
I think this supports what I said in How I’m thinking - there is clearly an individual productivity speedup that happens with LLMs. Although, I will say - it doesn’t look like it’s 10x from here. It’s a much more modest improvement than what the optimistic AI case would tell you.<br>The canary in the coal mine is that -11% in deployment frequency. That’s a system level metric. It directly measures how often the firms are delivering value to their customers.<br>I won’t even touch that code deletion ratio. You can draw your own conclusions there.<br>Let’s discuss a bit of nuance here.<br>Why am I using the term “productivity” and not “throughput” as the diagram does. I’m an operator trained in the ways of The Goal. To me throughput is reserved for total system flow - i.e. how many finished goods are flowing out of the system as a whole. When a developer gets done with a task, the feature is not shipped. This point will become very relevant later.
You’ll see that asterisk that say 10% of the dataset - only a subset of Faros’ customers pipe the telemetry product into their CI/CD pipeline. So for direct system level data - you’re actually relying on a sample of 2200 developers and 400 teams . My stance is that this subsample is large enough to draw conclusions and that the averages calculated for it likely describe the center of the distribution for the rest. You may disagree with that. I’ll try and flag where the weaknesses are in my argument if you depart from me there.
Ok, so what do we have here? A meaningful, but modest improvement in developer productivity. It’s good, but it’s not 10x. It’s, at best, 2x.<br>What else?<br>Second - overall system flow has slowed down at every step
This chart astonishes me. In my mind, it’s a brutal indictment of our use of LLMs. It’s hard for me to even put into words how bad this is.<br>These metrics are a direct correlate of system throughput. System throughput - from a business perspective - is the only thing that matters. You can’t sell a product until it’s on production. If the lead time of getting features onto production has increased almost 5x - we have changed the fitness of our operations by almost an order of magnitude in the wrong direction.<br>I will discuss this in more detail in What is Happening section, but the worst is yet to come:<br>Third - quality metrics have degraded precipitously
I feel like there’s nothing to even say here. There is no 10% to hide behind on this. If these stats generalize to the overall population, just imagine how much we collectively are impacting our customers. It’s like we’re all looking away from the phenomenology that the farther a defect travels down the operational pipeline, it becomes exponentially more costly to the whole system.<br>What is happening?
The Faros report is interesting for all sorts of reasons that I don’t have the space or inclination to discuss, but there’s another fact that’s very interesting. They built some statistical models trying to figure out whether certain features of the organizations predicted worse outcomes. Here’s an amazing observation:<br>DORA’s 2025 State of AI-Assisted Software Development report concludes that AI amplifies existing strengths and weaknesses, and that strong engineering foundations offer protection against AI’s Downsides. Our telemetry data, drawn from engineering systems across thousands of teams, does not support that as a protective factor. High-performing engineering organizations are experiencing the same downstream deterioration as everyone else.
Emphasis mine.<br>How bad is this really? I make the...