From Tokenmaxxing to Token Minimalism

From tokenmaxxing to token minimalism - by Thomas Johnson

Beyond Runtime

SubscribeSign in

From tokenmaxxing to token minimalism With a great number of tokens comes great responsibility

Thomas Johnson Jun 24, 2026

👋 Hi, I’m Thomas. Welcome to a new edition of Beyond Runtime, where I dive into the messy, fascinating world of distributed systems, debugging, AI, and system design. All through the lens of a CTO with 20+ years in the backend trenches.

QUOTE OF THE WEEK: “The amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it.” — Alberto Brandolini

For a few strange quarters, parts of the tech industry decided that the best way to prove AI adoption was to count how many tokens your engineers burned. Meta built a leaderboard ranking 85,000+ employees by token consumption, handing out titles like “Session Immortal” and “Token Legend.” Microsoft ran something similar. Salesforce set minimum monthly token spend targets and made everyone’s numbers visible to their teammates. And Goodhart’s Law did what it always does. The moment token usage became the target, it stopped measuring anything useful. Engineers started asking AI questions already covered in documentation, prototyping features they had no intention of shipping, and defaulting to agents for tasks they could have done faster by hand. All to avoid being seen as insufficiently AI-native. We’d been here before (see lines of code and story points). Every time the industry latches onto a proxy for productivity, it optimizes for the proxy and loses sight of the outcome. The reckoning

Then the invoices showed up. Uber burned through its entire 2026 Claude Code budget in four months. COO Andrew Macdonald described learning about the budget blowout as a “head-exploding moment.” After talking with senior engineering leaders, he came away unconvinced the spend was translating into outcomes: “That link is not there yet, […] maybe implicitly there is more that is getting shipped, but it’s very hard to draw a line between one of those stats and, ‘Okay, now we’re actually producing 25% more useful consumer features.’” Uber isn’t alone. Duolingo reversed a policy that had tied employee performance reviews to AI usage, after staff raised concerns that the metric rewarded tool adoption rather than actual results. Microsoft revoked developer Claude Code licenses months after enabling them. J.R. Storment of the FinOps Foundation described hearing from companies that they were three times over their entire 2026 token budget: “We started hearing existential crises, and the whole conversation shifted from tokenmaxxing and ‘go fast’ to ‘we need guardrails, how do we control this?’” The data made it worse. A two-year study of 20,000 developers by Faros AI found that output was rising, but so were bugs and rewrites. Research by Jellyfish found that engineers who used the most tokens were about twice as productive as those who used AI less, but they spent ten times the tokens to get there. On X, Aiswarya Sankar put a practitioner frame on it: “Try justifying spending $100k on token spend when only $18k even makes it to a stable prod feature. In the rush to maximize AI token spend, companies are wasting over 44% on bug fixes.”

Aiswarya Sankar@Aiswarya_Sankar

This is what we've been seeing with every company we work with.

Try justifying spending 100k on token spend when only 18k even makes it to a stable prod feature.

In the rush to maximize AI token spend, companies are wasting over 44% on bug fixes

Ed Zitron @edzitron

Uber’s COO has said that it’s getting “harder to justify” its AI costs because there was no way to show a link between AI spend and any meaningful increase in useful features. This is the first time I’ve seen a company say this directly.

https://t.co/xUhZvtpwah

3:38 PM · May 26, 2026 · 1.32M Views

124 Replies · 284 Reposts · 2.14K Likes

What’s next

The industry is now scrambling to instrument, audit, and govern token spend. The Linux Foundation announced the Tokenomics Foundation, a new standards body modeled on FinOps for cloud. Startups are rushing out token observability tools. Established vendors like Datadog and New Relic are tacking on token-level monitoring. Measurement and governance are very reasonable and probably useful. But I think the more interesting shift is cultural. I came across a conversation recently between Boris Cherny (Head of Claude Code) and Cat Wu (Head of Product, Claude Code). They were reflecting on a year of development, and check out what they said about context engineering: Boris: You know, people used to talk about prompt engineering, then context engineering. This was sort of matching where the model was at the time. Back in the days of Sonnet 3.5 you had to prompt engineer. Back in the days of Opus 4 you had to context engineer. But with the models of today you don’t do any of this. You give it the minimal possible system prompt, the minimal possible...

From Tokenmaxxing to Token Minimalism

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Britain Became as Poor as Mississippi