Turn a $3M AI bill into $1.9M

Turn a $3m AI bill into $1.9m | Will Hackett

Your subscription has been confirmed. Thank you.

← Back

Right now, someone on your team is using the most expensive AI model to edit a slide deck. They didn’t choose it; it’s just the default. Repeat that invisible choice a few thousand times a day, and your AI bill quickly starts to resemble payroll.

Two things are inflating that number. The default model is wrong for the task: flagship prices for work a cheaper one would nail. And the task is invisible on the invoice, a single lump sum with no way to tell which project or which model it went on.

Flowstate sits in the request path to close both leaks. We route each prompt to the model the task actually needs, and tie every dollar to the work it paid for. Nobody ships less: the same output that ran you $3m now costs $1.9m, and for the first time you can see which work the money actually bought.

You’re paying Opus prices for Sonnet work

Almost nobody picks a model. They use whatever’s selected when the box loads, and the default is the flagship: the most expensive model on offer. That’s the right call for a genuinely hard problem and pure waste on a one-line email. You can’t expect a marketer to know their default chat window costs five times more than necessary. The price isn’t on the screen, and the vendor has no incentive to show it.

So don’t make them learn it. The task should pick the model, not the person typing, and that decision belongs at the request layer rather than in anyone’s head. A summary or a reformat goes to Haiku, everyday coding and drafting to Sonnet, the genuinely hard reasoning to Opus. You can even split a single job, planning it in Opus and running the execution on Sonnet, keeping the expensive thinking for the steps that need it. The person, whatever they do all day, types the same prompt and gets the same answer. The bill is just smaller. And this isn’t only Claude Code: the same default sits in front of every chat your sales, ops and marketing people open too.

How much smaller? Peer-reviewed research, like Ding et al.’s Hybrid LLM, shows you can cut calls to the expensive model by up to 40% with no measurable drop in quality1. It’s just arithmetic on your model mix, and it works on any deployment you legitimately run.

This is the lever that grows with usage: the harder your team leans on AI, the more a wrong-model default costs you, and the more routing hands back. In the calculator below it’s the gap between your bold line and the green one.

The bill you can’t see

It’s day one. An engineer joins a company, gets handed an Enterprise Claude account, and burns $145 in his first five prompts. On a flat-rate plan that usage would have stretched all week; on a metered Enterprise plan, it’s gone before lunch. HR is already asking questions he can’t answer, and he’s doing the maths on a $5,000 month: “more than my salary.” Where the usage page should show a limit, it shows one word: Unlimited. That’s a real post from r/ClaudeCode, and it’s the second leak in a single screenshot.

The first leak was the model nobody chose. This is the other: the meter nobody’s watching. As of this year, Enterprise charges for every token your team spends in chat, Claude Code and Cowork, at standard API rates on top of the seat2. (Teams keeps a flat seat with an included allowance instead.) Metered pricing is cheap for a light team but runs away from you at scale, and because it lands as one undifferentiated invoice, nobody catches the spike until finance raises a flag. You can’t route what you can’t see, and you can’t choose between two deployments you’ve never compared. So compare them:

Pick your door. Size your team. Drag the usage. Annual AI spend for your chosen deployment (bold), the same deployment with Flowstate routing, and the other doors for comparison. The gap to the green line is what routing saves you; the gap to the cheapest dashed line is what the door costs you.

Claude for Enterprise, today $3.26m/yr your selection, no routing

+ Flowstate routing $1.89m/yr saves $1.38m (42%)

Cheapest door: Teams (Premium) $2.35m/yr $915k less than your door

$0$2.63m$5.25m$7.88m$10.50m0M250M500M750M1000Mtokens per team member / monthClaude for Enterprise — todayClaude for Enterprise + FlowstateClaude for Teams (Premium) Deployment strategyClaude for EnterpriseClaude for Teams (Premium) Team members (seats) Tokens / team member / month375M

▶ How this is calculated & why it's only an example

Watch what happens as you drag it. At low usage the two doors barely differ, and Enterprise is actually the cheaper one, which is why none of this matters for a light team. Push the usage up and the metered line runs away. Routing pulls a third to a half straight back off it, and at the top end even moving to a flat Teams seat starts to win. But you can only make either move once you can see the bill clearly enough to compare, and most teams can’t.

Which projects actually paid for themselves

Routing...

Turn a $3M AI bill into $1.9M

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy