AI Inference Costs: The Wake-Up Call for 2026 and 2027

AI Inference Costs: The Wake-Up Call for 2026 and 2027 · Greg Herlein

Herlein

Home

Posts

About Me

Projects

Resume

Staffing

Built with Hugo<br>Theme Blackburn

AI Inference Costs: The Wake-Up Call for 2026 and 2027

19 May 2026, 00:00

ai /

budgets /

inference /

anthropic /

github-copilot /

cto /

enterprise /

costs

The era of fixed-fee AI spending just ended. If you’re a CTO or engineering leader and you haven’t noticed yet, you will very soon — probably around September 2026 when some budget alerts start firing.

I’ve been watching this play out for a while now. Ed Zitron wrote a great (and entertainingly profane) newsletter piece this week called “AI Is Too Expensive” that lays out the macro picture — the hyperscaler capex insanity, the lab economics that don’t pencil out, all of it. I’m not going to rehash all of that here.

What I am going to do is tell you what it means for your engineering budget right now.

Because here’s the thing: two very significant pricing shifts are happening simultaneously, and most engineering leaders I talk to are only vaguely aware of one of them.

The Two Shifts You Need to Know About

Shift One: Anthropic ended fixed enterprise pricing.

Starting in late 2025 (and formally landing early 2026), Anthropic restructured enterprise contracts. Gone are the fixed-seat bundles. In their place: a low base seat fee, plus full token consumption billed at API rates. If your engineers are heavy users of Claude Enterprise — coding, reviews, agentic workflows — your spend is now variable and uncapped.

Shift Two: GitHub Copilot goes usage-based on June 1, 2026.

This is the one that’s going to surprise a lot of people. GitHub Copilot — which most organizations have been treating as a fixed $19 or $39/user/month line item — is switching to AI Credits on June 1. The seat price stays the same. But that price now only covers code completions. Everything else — Copilot Chat with premium models, code review, agentic features — consumes credits at token rates.

Hoo boy.

If you have 200 engineers on Copilot Enterprise at $39/user, you’ve been budgeting $7,800/month. That number is about to become a floor, not a ceiling.

Wait, There’s a Catch

GitHub is giving customers a buffer during the transition. June through August, Business customers get $30/user/month in extra credits, and Enterprise customers get $70/month extra. That sounds nice.

Here’s my concern: that buffer is going to mask the real cost for three months.

The engineers who use Copilot Chat constantly, who run code reviews through it, who are experimenting with agentic features — they’ll burn through $39 of credits pretty fast. But with the extra buffer, it won’t hurt yet. Then September hits, the buffer is gone, and the bill arrives.

If you don’t use June–August as a metering exercise — tracking actual consumption per developer per day — you’re going to be flying blind into your fall budget review.

This. This. This. Use the free credits to measure, not just to spend.

What Is This Actually Costing Companies?

Let me give you some concrete data points (with appropriate caveats on sourcing).

Salesforce CEO Marc Benioff said on the All-In podcast earlier this month that Salesforce is on track to spend $300 million on Anthropic tokens in 2026 . That’s a confirmed CEO statement, not an estimate.

Uber’s CTO disclosed the company burned through its entire 2026 AI budget in the first few months of the year, driven by Claude Code adoption across roughly 5,000 developers.

A Goldman Sachs research report found that AI inference costs at one unnamed software company were approaching 10% of total headcount costs — and trending toward headcount parity within a few quarters.

Read that last one again. Headcount parity. If you have a $10M engineering payroll, that’s a scenario where you’re spending $10M/year on AI inference on top of it. That math gets uncomfortable fast.

(To be clear: that Goldman figure describes one unnamed company. It’s a data point, not an industry average. But it’s not crazy, either, given the trajectory.)

Why Costs Aren’t Coming Down Quickly

I’ve seen this argument: “Well, silicon is getting cheaper. This will sort itself out.”

Maybe. Eventually. But not before 2027 planning season.

Here’s the structural problem. The labs — Anthropic and OpenAI — are not profitable on inference. Anthropic’s own gross margins came in at 40% in 2025 , which was 10 percentage points below their own projections, with inference costs running 23% higher than anticipated (The Information, January 2026). OpenAI’s gross margins fell from 40% in 2024 to 33% in 2025 , missing their own 46% forecast as inference costs grew fourfold year-over-year to roughly $8.4 billion.

Both labs are under enormous pressure...

AI Inference Costs: The Wake-Up Call for 2026 and 2027

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast