The Era of Token Scarcity

The Era of Token Scarcity - by Taylan Durmus

Tay's Substack

SubscribeSign in

The Era of Token Scarcity How and Why VCs Need to Update Their Playbook

Taylan Durmus Jun 09, 2026

For years, we operated under the assumption that frontier AI models were a low-cost utility, failing to ask if the price we were paying was fair. When we did look for the cost, we went into discussions of water use and emissions. While these are more than valid, they also draw some attention away from the real dollar costs that have now begun to take the spotlight with implications for us all. Frontier labs are in a furious land grab, betting that enterprise customers won’t notice the bill until their core operations are addicted to their models. The issue is that usage costs have spiked faster than even the labs anticipated, and the standard “subsidy-to-hook” playbook is potentially faltering. Facing the cold light of public market scrutiny, these labs can no longer afford to subsidise our token consumption. API bills are raising C-suite eyebrows, compressing operating margins, and detonating valuation multiples. For us in the world of venture capital, the assumption that compute costs were “someone else’s problem” has evaporated. Now is the time to sit up and update our investment strategies.

To receive new posts and support my work, consider becoming a free or paid subscriber.

Why now?

The transition to an era of token scarcity is defined by a constrained supply of compute, rising hardware/infrastructure costs, and a scramble to determine who should ultimately absorb the cost of intelligence. The sudden visibility of these costs has resulted in immediate friction with customers. As Axios reported, “AI sticker shock” has hit the corporate world, driven by organisations realising that deploying AI at scale is vastly more expensive than anticipated. High-profile incidents, such as the CTO of Uber revealing that the company had burned through its entire 2026 AI budget within the first four months of the year , serve as a cautionary tale of what happens when deployment outpaces governance. One of the core drivers behind this increase in cost is the evolution of how humans and systems interact with models. When users interacted with AI as a conversational chatbot, the compute costs were relatively bounded by human typing speeds, reading comprehension, and the linear nature of a text-based prompt-and-response cycle. However, the jump from a reactive “chatbot” to an autonomous “agent” has fundamentally altered the unit economics of inference . A 2026 research paper authored by researchers from Stanford, MIT, and Google DeepMind provides the empirical evidence. The researchers conducted a study of token consumption patterns specifically within agentic coding tasks. Their findings shattered previous assumptions. It turns out that agentic workflows consume up to 1000x more tokens than simple chat-based code reasoning tasks. The research highlights that token usage in these autonomous environments is also highly variable and inherently unpredictable. Executing the same task with the same model can see token usage vary by a factor of up to 30x depending on the pathway the agent takes. Agents can operate in loops, trying new approaches when others fail, and accumulate costs without necessarily always making progress on a task. This means that customers only see the true cost after the inference has already taken place and the capital expended . How much does it actually cost?

Anthropic’s recent enterprise estimates for its Claude Code product illustrate this new reality. The company’s own internal estimates put average usage at roughly $150 to $250 per developer per month, with an average cost of around $13 per active developer day. For a typical engineering team of 20 developers, this translates to an unbudgeted, variable operational expense of roughly $36,000 to $60,000 annually. This is before factoring in the necessary integration work, governance tooling, extra API usage, or the inevitable secondary AI products the team will adopt. I think those figures are bordering on disingenuous at worst and extremely conservative at best. I’ve personally seen a planning exercise burn $15 in API costs using Opus 4.8 in a session that lasted 15 minutes. For context, Jensen Huang is already proposing that tokens should be offered alongside salary as a perk of the job to attract the best talent. I should highlight that I do not code as a full-time SWE. I’m the Head of Venture Capital at Campden Hill Capital so my development nowadays is limited to the development of a simple system of record for CHC and an open-source project I maintain mainly to stay up to date with the latest tools (shameless plug for Nojoin). The backlash

A great example of this new paradigm is GitHub Copilot. It transitioned to a usage-based billing model starting this month, replacing its flat-rate subscription system and sparking an immediate uproar....

The Era of Token Scarcity

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs