Token Saver for AI Tools

Tokens: The Hidden Currency of AI

Public Service Announcement

Tokens are currency. Spend wisely.

Every word you type to an AI costs money. Most people burn through tokens without knowing why — or how to stop.

SCROLL TO LEARN

█ "Hello, how are you?" = 6 tokens █ System prompts re-sent with every message █ Conversation history compounds on every turn █ GPT-4o Input: $2.50/M tokens · Output: $10/M tokens █ Claude Sonnet 4: $3/M input · $15/M output █ Gemini 1.5 Pro: $1.25/M input · $5/M output █ Whitespace counts · Punctuation counts · Everything counts █ "Hello, how are you?" = 6 tokens █ System prompts re-sent with every message █ Conversation history compounds on every turn █ GPT-4o Input: $2.50/M tokens · Output: $10/M tokens █ Claude Sonnet 4: $3/M input · $15/M output █ Gemini 1.5 Pro: $1.25/M input · $5/M output █ Whitespace counts · Punctuation counts · Everything counts

01 — Fundamentals

What even is a token?

Tokens are the chunks AI models break text into — roughly 4 characters or ¾ of a word. They're not words. Not characters. Something in between. Type anything below and watch it get sliced.

The quick brown fox jumps over the lazy dog. Hello, world! Tokenization is fascinating.

Tokens

Characters

Words

Chars/Token

02 — Model Breakdown

How each model handles tokens

Claude, Gemini Pro, and ChatGPT/Codex all tokenize and price differently. Here's how they actually compare.

Anthropic

Claude Sonnet 4

Context window 200K tokens

Input cost $3.00 / 1M

Output cost $15.00 / 1M

Tokenizer BPE (custom)

~Chars/token ~3.5–4.5

Multimodal ✓ Images, PDFs

Google

Gemini 1.5 Pro

Context window 1M tokens

Input cost $1.25 / 1M

Output cost $5.00 / 1M

Tokenizer SentencePiece

~Chars/token ~3.0–4.0

Multimodal ✓ Video, Audio, Images

OpenAI

GPT-4o / Codex

Context window 128K tokens

Input cost $2.50 / 1M

Output cost $10.00 / 1M

Tokenizer tiktoken (cl100k)

~Chars/token ~4.0–4.5

Multimodal ✓ Images

Important: Output tokens cost 3–5× more than input tokens across all models. That verbose AI response you love? It's costing 5x more per token than your question did. The model "thinking out loud" in chain-of-thought reasoning also burns output tokens silently before it gives you the final answer.

03 — Hidden Costs

Token vampires: what's draining you

These are the silent token consumers most people never think about. Click each to reveal how bad it really is.

👻

System Prompts

▲ RESENT EVERY SINGLE MESSAGE

Your system prompt isn't sent once — it's attached to every single API call. A 500-token system prompt on 1,000 daily API calls = 500,000 extra tokens per day. That's $1.50/day just in system prompt overhead with Claude Sonnet. // 500 token system prompt × 1,000 calls/day = 500,000 extra tokens/day = ~$1.50/day just in overhead = ~$547/year in wasted system prompts

Fix: Keep system prompts lean. Move static reference docs to retrieval (RAG) instead of stuffing them in the prompt.

📜

Conversation History

▲ GROWS QUADRATICALLY PER CHAT

AI models have no memory. Every message in a chat gets re-sent in full to the API. By message 20, you might be sending 5,000 tokens of old conversation just to ask one new question. A 30-turn conversation can easily run 15,000+ input tokens — even if each message was short. Turn 1: 100 tokens sent Turn 5: 600 tokens sent (all history) Turn 10: 1,400 tokens sent Turn 20: 4,200 tokens sent Turn 30: 9,800 tokens sent 🔥

Fix: Implement conversation summarization — replace old messages with a compressed summary every N turns.

🖼️

Images & Vision

▲ 1 IMAGE = UP TO 1,700 TOKENS

Uploading an image to a vision model doesn't cost "a little extra." A high-res image with Claude can cost up to 1,700 tokens — just for the image itself, before you've typed a word. Low-res mode can drop this to ~85 tokens, but you often get that tradeoff automatically. High-res image → up to 1,700 tokens Low-res image → ~85 tokens Full-page PDF → ~1,500+ tokens per page Video (Gemini) → charged per frame extracted

Fix: Resize images before sending. Most tasks don't need full resolution. Use low-res mode when available.

💬

Verbose Prompting

▲ PLEASANTRIES ARE EXPENSIVE

"Hi! I hope you're doing well today. Could you please help me with something? I'm working on a project and I was wondering if you might be able to..." — this preamble costs ~40 tokens and adds zero value. At scale, politeness is pricey. ❌ "Hi! I hope this finds you well. Could you please summarize the following text for me?"

✓ "Summarize:"

Fix: Be direct. AI doesn't need social warmth. Remove preamble, filler, and redundant context.

🔄

Re-asking for Context

▲ COPY-PASTING DOCS REPEATEDLY

When you paste a long document into the chat and ask multiple questions about it in the same session, you're re-sending the entire document with every new message. A 10,000-word document pasted into chat becomes ~13,000 tokens, re-sent on every turn. Turn 1: Paste 13,000 token document + question Turn 2: Same 13,000...

Token Saver for AI Tools

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast