Cheaper LLM tokens led to bigger AI bills (Jevons paradox)

AI Token Economics: Why Cheaper Tokens Made Your Bill Explode — Northwood Systems

Blog/Analysis Uber burned through its entire annual AI budget in four months. Not by being wasteful, but by doing exactly what its leadership encouraged. The company had internal leaderboards celebrating heavy AI usage, executives publicly praised the productivity gains, and then the bill arrived. The result: a $1,500-per-month hard cap on each agentic coding tool, per employee, effective June 2026.1

That story isn't a cautionary tale about one company's poor planning. It's a preview of what happens when metered, per-token pricing meets agentic workloads at scale, and it's landing in your budget right now.

Start with the numbers.

0 mo To burn through the annual AI budget Uber, 2026

0% Fall in LLM API prices, 2025→2026 Industry pricing aggregators, 2026

0M Tokens the median developer burns per month Morph LLM, 2026

The Jevons paradox is running your AI budget

In 1865, economist William Stanley Jevons noticed something counterintuitive. As steam engines became more efficient, cheaper to run per unit of work, total coal consumption went up, not down. Efficiency unlocked demand that hadn't existed before.

The Jevons paradox is what's happening to your AI spend. Token prices dropped roughly 80% between 2025 and 2026.2 Your engineers didn't pocket those savings; they used them as permission to run more, longer, and more ambitiously. A task that cost $10 now costs $2, so your team runs it five times instead of once, then hands it to an agent that runs it fifty times automatically.

The strongest counter-argument: "If unit costs fell 80%, even tripling usage keeps the bill flat." That's true for chat-style, single-turn interactions. It breaks completely once you introduce agentic loops, because an agent doesn't triple token consumption. It multiplies it by 50x.3 A single agentic coding session now pushes 1–3.5 million tokens per task;4 one agentic coding tool, used heavily, clears Uber's $1,500 monthly cap on its own.

The math isn't subtle.

What one agentic coding turn actually costs

Take Claude Opus 4.8, a model your senior engineers might reasonably reach for on a complex refactoring task. Input tokens run $5 per million; output tokens run $25 per million.

A single agentic turn with a reasonable context: 200,000 input tokens × $5/M = $1.00 . The model responds with 50,000 output tokens × $25/M = $1.25 . Total: $2.25 per turn.

Now multiply that across a real workday: 40 turns per day, 20 working days. That's $1,800 a month, from one engineer, using one tool, on one model. Uber's $1,500 cap doesn't cover it.

$0.00 Per agentic turn 200K in + 50K out · Opus 4.8

$0 Per developer-day × 40 turns

$0 Per developer-month × 20 days, past Uber's $1,500 cap

The pricing chart below shows why output tokens are the number that matters. Input is the sticker price. Output is the bill.

Fig. 1Input is the sticker price. Output is the bill.

Cost per 1M tokens, USD, input vs output by model InputOutput

$0$10$20$30$40Gemini Flash-Lite Gemini 3.1 Pro Claude Sonnet 4.6 GPT-5.4 Claude Opus 4.8 GPT-5.5

Provider pricing, compiled June 2026 · cloudzero.com

Output tokens cost 4–10× input tokens across every major model. On agentic workloads, output volume is the variable that escapes. Developer spend follows a power law

Not every engineer hits $1,800 a month. A solo developer on a single subscription tool pays roughly $100. A heavy multi-tool user lands around $400. The power agentic user, the one actually getting the productivity gains, runs $1,500. And Microsoft reportedly cancelled employee AI licences after discovering some engineers were running $2,000 per month each.7

Fig. 2Typical monthly AI-coding spend per developer

Upper bound of reported range, USD, 2026 $0$500$1k$1.5k$2kSolo (subscription) Heavy multi-tool user Power agentic user MS engineers (licences cancelled)

Morph LLM (ranges); Microsoft via reporting · morphllm.com

Monthly AI coding spend per developer varies by more than 20× depending on tool usage pattern. The productivity gains concentrate in the expensive tail. That distribution matters for how you think about governance. The engineers generating the most business value from AI are, structurally, the same engineers generating the largest bills. Blunt per-tool caps catch both.

Sixty-three percent of organisations now name AI an active FinOps concern, up from 31% in 2024, according to the FinOps Foundation.5 That doubling isn't panic; it's recognition that per-token billing has no natural ceiling, and finance teams weren't built to forecast it.

Converting variable cost into fixed cost

Every dollar you spend on external LLM APIs is a variable cost that scales with usage. There is no cap baked into the architecture. You impose caps manually, reactively, after the budget has already moved.

The structural alternative is converting that variable cost into a fixed, plannable one: infrastructure you own, models you...

Cheaper LLM tokens led to bigger AI bills (Jevons paradox)

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews