Tokenomics: The 62.5-minute rule for Claude's cache

gmays1 pts0 comments

Tokenomics: the 62.5-minute rule for Claude's cache | Ryan Skidmore

Skip to content<br>&larr; Posts Tokenomics: the 62.5-minute rule for Claude's cache<br>Sunday, 17 May 2026 8 min read ai<br>tokenomics

Is it more efficient to refresh the 5-min cache, let it expire, or just rely compaction?<br>Unfortunately one of the downsides of being a chronic tokenmaxxer is regularly hitting 5-hour and weekly token limits across several providers. This often comes at the most inconvinient time possible when you’re in the middle of something and ideally I’d prefer to not spend any more money on additional AI subscriptions if possible. I started looking a little more closely at my request logs to see if this was a skill issue and I noticed that I’m writing my entire context (which can be as high as 400k/500k in some sessions) to the cache a little more often than I should be. Each write was pretty small in isolation, but added up pretty quickly.

5 minutes really isn’t a long time, so it’s easy to get distracted and miss the cache refresh and pay for the full prefix write. This got me thinking, if a prompt cache is about to expire and I don’t have a real request to send, is it cheaper to ping it with a keep-alive, or let it die and rewrite it later?

tl;dr: The answer is 62.5 minutes . If you expect to need the cache again before then, refresh it. If not, let it expire. That number doesn’t move when you switch between models and it doesn’t move when the cached prefix grows from 5K tokens to 500K. The dollars change, but the decision point is still the same.

The numbers

Anthropic’s pricing page lists prompt caching as a set of multipliers on the normal input-token price:

ModelBase input5-min cache write1-hour cache writeCache read / refreshOutputOpus 4.7$5 / MTok$6.25 / MTok$10 / MTok$0.50 / MTok$25 / MTokSonnet 4.6$3 / MTok$3.75 / MTok$6 / MTok$0.30 / MTok$15 / MTokHaiku 4.5$1 / MTok$1.25 / MTok$2 / MTok$0.10 / MTok$5 / MTok<br>The multipliers are the same for every model: a 5-minute cache write costs 1.25x the base input price, a 1-hour cache write costs 2x, and a cache read costs 0.10x.

Read operations do two jobs: A request that hits a live cache is billed at the read rate, and the same request refreshes the cache TTL back to 5 minutes, so cache hit = cache refresh.

The trick to keeping the cache warm is a super tiny request that reads the cached prefix before the TTL runs out. The cost is 10% of the normal input price for that prefix, but the catch is that you have to keep doing it until you need it again.

A case study of a 100K-token prefix

Let’s take Opus 4.7 and a 100K-token cached prefix as an example. That’s not a massive context window, but really easy to hit considering it’s usually just enough to cover a system prompt, tool definitions, a project sketch, and some running notes from an agent session.

Writing that prefix to the 5-minute cache costs:

100K tokens * $6.25 / MTok = $0.625<br>Reading it, which also refreshes it, costs:

100K tokens * $0.50 / MTok = $0.05<br>If I keep the cache alive for T minutes, I pay the first write and then one read every 5 minutes:

refresh_cost(T) = W + R * floor(T / 5)<br>If I let the cache expire and come back later, I pay the first write and then a second write:

rewrite_cost(T) = W + W<br>= 2W<br>The break-even is where the refresh reads add up to one extra write:

W + R * (T / 5) = 2W<br>R * (T / 5) = W<br>T = 5 * (W / R)<br>= 5 * (1.25 / 0.10)<br>= 62.5 minutes<br>The exact boundary is a little stair-stepped in practice, because you refresh in 5-minute chunks rather than in continuous time. That doesn’t change the rule though because below about an hour, refreshing always wins. Past an hour, it’s no longer efficient to keep paying the keepalive tax.

What cancels out

I expected the answer to depend on the model or the text size, but surprisingly it doesn’t. Both sides of the comparison scale with the model’s base input price and the number of cached tokens. A bigger prefix makes both strategies more expensive and Opus makes both strategies more expensive than Sonnet, but when you divide the write price by the refresh price, all of that disappears:

W / R = (N * base * 1.25) / (N * base * 0.10)<br>= 1.25 / 0.10<br>= 12.5<br>That is why the 62.5 minute timing rule is the same for a 5K Sonnet prefix and a 500K Opus prefix, but the dollar damage from choosing suboptimally changes between the two models.

For a 100K prefix on Opus 4.7 and Sonnet 4.6, both pairs land on the same x-axis:

Refresh vs. rewrite cumulative cost<br>Cumulative cost on a 100K-token cached prefix as a function of minutes since<br>the last cache write. Solid lines show the refresh strategy for Opus 4.7<br>and Sonnet 4.6; dashed lines show the rewrite strategy. All four lines<br>cross at exactly 62.5 minutes, which is the same regardless of model or<br>prefix size.

$0.00<br>$0.50<br>$1.00<br>$1.50<br>$2.00 0 30 60 90 120<br>refresh wins

let it expire

crossover at 62.5 min (same for every model)

Refresh vs. rewrite: cumulative cost on a 100K-token cached prefix<br>Opus 4.7...

cache mtok prefix refresh write minutes

Related Articles