Where AI coding spend goes: 48% code, 40% thinking

Less Than Half My AI Coding Spend Is Actually Writing Code - CodeBurn | CodeBurn

All postsMay 29, 2026Less Than Half My AI Coding Spend Is Actually Writing Code A real-data breakdown of where the money actually goes — measured, not estimated. Over the last 30 days I spent $7,890 across 105,718 AI coding API calls . I assumed most of that was the models writing code. It was not. Only 47.9% of my spend went to actually producing code. The rest went to exploring codebases, debugging, delegating to subagents, and plain back-and-forth conversation. I only know this because I built a tool to measure it. CodeBurn is a CLI that reads your AI coding session data directly from disk and classifies every API call into one of 13 task categories. No API keys, no wrappers, no data leaving your machine. The classification is fully deterministic — based on tool-usage patterns and message content, not an LLM — so the numbers are reproducible. Here is where the money actually went: Task categoryCost% of spendCoding$3,78147.9%Exploration$87611.1%Delegation$7659.7%Debugging$6958.8%Feature Dev$6548.3%Conversation$4625.9%Brainstorming$2943.7%Testing$1682.1%Refactoring$1321.7%Git Ops$340.4%Build/Deploy$210.3%Planning$4General$4Total$7,890100% Coding is the single largest bucket, but it is still less than half. If you group everything that produces code — Coding, Feature Dev, Refactoring, and Testing — you get about 60% . Which means roughly 40% of my spend was the model thinking, not typing : reading files, reasoning about the problem, talking through approaches, and chasing bugs. That is not waste. Exploration and debugging are real work. But it reframes what I am paying for. I was not buying a code generator. I was buying a collaborator that spends most of its budget understanding the problem before it writes a line. npx codeburn@latestThe rest of this post walks through how CodeBurn produces numbers like these, using real screenshots from the same workflow. The Dashboard Run codeburn with no arguments and you get an interactive TUI dashboard. It loads the last 7 days by default. Arrow keys switch between Today, 7 Days, 30 Days, This Month, and 6 Months. CodeBurn interactive dashboard showing daily cost, projects, models, activity breakdown, core tools, and shell commandsEverything is on one screen. Top row: total cost, number of API calls, sessions, and cache hit rate. Below that: daily cost chart, per-project breakdown with average cost-per-session, activity categories with one-shot rates, model usage, core tool distribution, and the exact shell commands your AI ran. The activity panel is where it gets interesting. CodeBurn classifies every API call into 13 task categories based on tool usage patterns and message keywords. Coding, Conversation, Feature Dev, Exploration, Debugging, Refactoring, Testing, Delegation, Git Ops, Build/Deploy, Brainstorming, Planning, and General. The classification is fully deterministic. No LLM calls. In the screenshot above, Coding accounts for $19.08 across 38 turns with an 88% one-shot rate. Conversation is $3.29 across 24 turns. That ratio matters. If Conversation consistently dominates your spend, you are paying for chat, not output. Model Breakdown by Task The models command gives you a per-model cost table. Add --by-task and it explodes each model into rows for every task type it was used for. codeburn models --by-taskPer-model, per-task token and cost breakdown across Claude Opus 4.6, GPT-5.5, Sonnet 4.6, Haiku 4.5, and CursorThis is real data from a 30-day window. Opus 4.6 spent $119.68 on Coding alone, with 604.4K output tokens and 155.1M cache reads. GPT-5.5 on Codex did $4.63 on Feature Dev and $2.59 on Coding. Sonnet 4.6 handled Exploration for $2.04 with 1.1M cache reads. Haiku 4.5 did lightweight Exploration at $0.297. The table shows Input, Output, Cache Write, Cache Read, Total tokens, and Cost for every combination. You can see exactly where each model earns its keep and where it might be overkill. codeburn models --task debugging --provider claude codeburn models --top 5 codeburn models --format markdownFilter by task, provider, or limit to top N. The markdown format is useful for pasting into PRs or team docs. Waste Detection The optimize command scans your session history and your local config for specific, fixable waste patterns. Every finding includes the estimated token and dollar savings, and a ready-to-paste fix. codeburn optimizeCodeBurn optimize output showing Health: F (20/100), 6 issues found, with potential savings of ~25.4M tokens (~$17.18)This scan found 6 issues across 54 sessions and $216.35 of spend. Total potential savings: ~25.4M tokens, roughly $17.18 or 8% of the total. The setup health grade is F (20/100). The first finding flags 2 expensive sessions with weak delivery signals. One session cost $116.17 with 28 retries. Another cost $4.20 with no edit turns at all. These are review candidates, not proof of waste. CodeBurn flags them so you can decide whether...

Where AI coding spend goes: 48% code, 40% thinking

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

Naphtha Shortages Having a Growing Impact in Japan