Tokens: The Hidden Currency of AI
Public Service Announcement
Tokens are<br>currency.<br>Spend wisely.
Every word you type to an AI costs money. Most people burn through tokens without knowing why — or how to stop.
SCROLL TO LEARN
█ "Hello, how are you?" = 6 tokens<br>█ System prompts re-sent with every message<br>█ Conversation history compounds on every turn<br>█ GPT-4o Input: $2.50/M tokens · Output: $10/M tokens<br>█ Claude Sonnet 4: $3/M input · $15/M output<br>█ Gemini 1.5 Pro: $1.25/M input · $5/M output<br>█ Whitespace counts · Punctuation counts · Everything counts<br>█ "Hello, how are you?" = 6 tokens<br>█ System prompts re-sent with every message<br>█ Conversation history compounds on every turn<br>█ GPT-4o Input: $2.50/M tokens · Output: $10/M tokens<br>█ Claude Sonnet 4: $3/M input · $15/M output<br>█ Gemini 1.5 Pro: $1.25/M input · $5/M output<br>█ Whitespace counts · Punctuation counts · Everything counts
01 — Fundamentals
What even<br>is a token?
Tokens are the chunks AI models break text into — roughly 4 characters or ¾ of a word. They're not words. Not characters. Something in between. Type anything below and watch it get sliced.
The quick brown fox jumps over the lazy dog. Hello, world! Tokenization is fascinating.
Tokens
Characters
Words
Chars/Token
02 — Model Breakdown
How each model<br>handles tokens
Claude, Gemini Pro, and ChatGPT/Codex all tokenize and price differently. Here's how they actually compare.
Anthropic
Claude Sonnet 4
Context window<br>200K tokens
Input cost<br>$3.00 / 1M
Output cost<br>$15.00 / 1M
Tokenizer<br>BPE (custom)
~Chars/token<br>~3.5–4.5
Multimodal<br>✓ Images, PDFs
Gemini 1.5 Pro
Context window<br>1M tokens
Input cost<br>$1.25 / 1M
Output cost<br>$5.00 / 1M
Tokenizer<br>SentencePiece
~Chars/token<br>~3.0–4.0
Multimodal<br>✓ Video, Audio, Images
OpenAI
GPT-4o / Codex
Context window<br>128K tokens
Input cost<br>$2.50 / 1M
Output cost<br>$10.00 / 1M
Tokenizer<br>tiktoken (cl100k)
~Chars/token<br>~4.0–4.5
Multimodal<br>✓ Images
Important: Output tokens cost 3–5× more than input tokens across all models. That verbose AI response you love? It's costing 5x more per token than your question did. The model "thinking out loud" in chain-of-thought reasoning also burns output tokens silently before it gives you the final answer.
03 — Hidden Costs
Token vampires:<br>what's draining you
These are the silent token consumers most people never think about. Click each to reveal how bad it really is.
👻
System Prompts
▲ RESENT EVERY SINGLE MESSAGE
Your system prompt isn't sent once — it's attached to every single API call. A 500-token system prompt on 1,000 daily API calls = 500,000 extra tokens per day. That's $1.50/day just in system prompt overhead with Claude Sonnet.<br>// 500 token system prompt × 1,000 calls/day<br>= 500,000 extra tokens/day<br>= ~$1.50/day just in overhead<br>= ~$547/year in wasted system prompts
Fix: Keep system prompts lean. Move static reference docs to retrieval (RAG) instead of stuffing them in the prompt.
📜
Conversation History
▲ GROWS QUADRATICALLY PER CHAT
AI models have no memory. Every message in a chat gets re-sent in full to the API. By message 20, you might be sending 5,000 tokens of old conversation just to ask one new question. A 30-turn conversation can easily run 15,000+ input tokens — even if each message was short.<br>Turn 1: 100 tokens sent<br>Turn 5: 600 tokens sent (all history)<br>Turn 10: 1,400 tokens sent<br>Turn 20: 4,200 tokens sent<br>Turn 30: 9,800 tokens sent 🔥
Fix: Implement conversation summarization — replace old messages with a compressed summary every N turns.
🖼️
Images & Vision
▲ 1 IMAGE = UP TO 1,700 TOKENS
Uploading an image to a vision model doesn't cost "a little extra." A high-res image with Claude can cost up to 1,700 tokens — just for the image itself, before you've typed a word. Low-res mode can drop this to ~85 tokens, but you often get that tradeoff automatically.<br>High-res image → up to 1,700 tokens<br>Low-res image → ~85 tokens<br>Full-page PDF → ~1,500+ tokens per page<br>Video (Gemini) → charged per frame extracted
Fix: Resize images before sending. Most tasks don't need full resolution. Use low-res mode when available.
💬
Verbose Prompting
▲ PLEASANTRIES ARE EXPENSIVE
"Hi! I hope you're doing well today. Could you please help me with something? I'm working on a project and I was wondering if you might be able to..." — this preamble costs ~40 tokens and adds zero value. At scale, politeness is pricey.<br>❌ "Hi! I hope this finds you well. Could you<br>please summarize the following text for me?"
✓ "Summarize:"
Fix: Be direct. AI doesn't need social warmth. Remove preamble, filler, and redundant context.
🔄
Re-asking for Context
▲ COPY-PASTING DOCS REPEATEDLY
When you paste a long document into the chat and ask multiple questions about it in the same session, you're re-sending the entire document with every new message. A 10,000-word document pasted into chat becomes ~13,000 tokens, re-sent on every turn.<br>Turn 1: Paste 13,000 token document + question<br>Turn 2: Same 13,000...