Show HN: Claude Code's $200 plan is a 17× subsidy on the raw API

coral-ai/claude-code-token-xray at main · Coral-Bricks-AI/coral-ai · GitHub

//files/disambiguate" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

//files/disambiguate;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

Coral-Bricks-AI

coral-ai

Public

Notifications You must be signed in to change notification settings

Fork

Star 31

FilesExpand file tree

main

/claude-code-token-xray Copy path

Directory actions

More options More options

Directory actions

More options More options

Latest commit

History History History

main

/claude-code-token-xray

Top

Folders and files NameNameLast commit message Last commit date parent directory .. assets

assets

README.md

cost.py

main_vs_sidecar.py

requirements.txt

reread_breakdown.py

token_time_breakdown.py

View all files

README.md Outline claude-code-token-xray

Reverse-engineer a month of your own local Claude Code logs (~/.claude/projects/*/*.jsonl) into where the tokens, time, and cost actually go — and run it on yours. Reads only local logs ; nothing is sent anywhere.

What it found (one month of my own logs — 181 sessions, 25,564 model calls):

You don't pay to generate, you pay to re-read. ~29M unique tokens → 4.35B billed (~150×) , because every turn re-sends the whole ~173K-token context.

The bill is 84% input / 16% output — and re-reading the same context is 64% of it.

The biggest line is the one you never see: hidden reasoning is 84% of output and ~60% of everything re-read.

~$3,371 for the month at Opus 4.7 list rates. Caching already serves 98% of input — and re-reading is still 64% of the bill.

Full write-up (all the tables, the why, the main-thread-vs-subagent split) → coralbricks.ai/blog/claude-code-token-xray

Quickstart

pip install -r requirements.txt # just tiktoken python3 token_time_breakdown.py python3 cost.py python3 main_vs_sidecar.py python3 reread_breakdown.py

tiktoken is OpenAI's tokenizer, not Claude's, so token proportions are reliable to ~±15%, not Claude-exact. The billed-token counts in cost.py come straight from the API usage blocks and are exact.

What a month cost

From cost.py on my logs, priced at Opus 4.7 list rates:

Line item Cost Share

Input — re-reading context (cache reads) $2,176 64%

Input — cache writes $682 20%

Input — fresh (uncached) $2 0%

Output — reasoning $429 13%

Output — tool calls + summaries $82 2%

Total $3,371 100%

Caching is the only thing keeping it sane — without it the same work lists at ~$22,630 (~7×). Your numbers will differ; that's the point. Run it on yours.

Scripts

token_time_breakdown.py — the headline table: tokens (marked input/output) and wall-clock time per activity (reasoning, running commands, writing tool calls, subagents, summaries, reading/searching, editing) plus the passive-context rows (system prompt + tools, attachments, the typed prompt, injected reminders). One pass, so tokens and time stay consistent. Reasoning isn't stored in plaintext (only an encrypted signature), so it's recovered by subtraction: output − tool_calls − summaries. Time is reconstructed from event timestamps.

cost.py — billed token totals (cache reads / cache writes by TTL / fresh input / output) priced at Opus 4.7 list rates, plus the no-caching counterfactual.

main_vs_sidecar.py — splits the human-driven main thread from spawned subagents (logged under nested */subagents/*.jsonl); reports billed tokens, per-model mix, cache-hit rate, turns per agent (per session for the main thread, per subagent for the sidecar), and cost for each, plus the combined total.

reread_breakdown.py — per-activity cumulative input: replays each session's context growth to show what each kind of context costs once it's re-read every turn. Reports unique vs re-read tokens per activity (reasoning is the biggest re-read line). The replay is scaled to the measured billed input (exact); the per-activity split is a model.

Caveats

One person's month on one machine — directional, not a benchmark. Claude Code is dynamic, so your split will differ. That's the point: run it on yours.

A generation-time gap also includes the model reading its context...

Show HN: Claude Code's $200 plan is a 17× subsidy on the raw API

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine