Small Context, High Parallelism: How To 10x Reduce Agentic Coding Cost

Reducing total token consumption of agentic coding

Reducing Total Token Consumption of Agentic Coding

TL;DR — Two levers reduce cost:

1. Less turns (parallel tool calls → fewer API round-trips)

2. Less context (methodology + snippets → only relevant info survives)

📊 Understanding the problem: why tokens grow quadratically

What you see vs. what is sent

👤 User side 0 chars

🤖 Sent to the API 0 chars

▶ Next message ⏩ Auto ↺ Reset

The bottom line after 10 exchanges

Characters typed by the user

Characters sent to the API (cumulative)

Multiplication factor

O(n²)

Cumulative cost complexity

Each new message resends the entire history. The total cost grows like the sum 1+2+3+…+n = n(n+1)/2 .

Interactive simulation

Messages (n):

Avg size per message (s):

Tool loops per turn (k):

Classic chatbot vs. agentic coding

💬 Classic chatbot

User: "Fix this bug"→ 200 chars

API response #1system + msg = 4,200 chars

User: "Also add tests"→ 150 chars

API response #2system + history = 8,500 chars

User: "Rename the var"→ 100 chars

API response #3system + history = 12,800 chars

🤖 Agentic coding

User: "Fix this bug"→ 200 chars

API call #1system + msg = 4,200 chars

→ Read(main.py)tool result: 3,000 chars

API call #2everything + tool = 7,400 chars

→ Edit(main.py)tool result: 500 chars

API call #3everything + tools = 8,100 chars

→ RunTests()tool result: 2,000 chars

API call #4everything + tools = 10,300 chars

✓ Result returned to user1 user message → 4 API calls

▶ Animate ↺ Reset

1 user message = 3-15 API calls

Each tool call (Read, Edit, Bash…) triggers a new API call with the full accumulated context.

Agentic cost: the hidden multiplier

Classic chatbot

n × s → O(n²)

10 msgs × 2k = ~110k chars total

Agentic (k loops/turn)

n × k × s → O(n² × k)

10 msgs × 5 loops × 2k = ~550k chars

With agentic coding, the context grows k times faster . A 20-message session with 5 tool loops per turn sends millions of characters to the API.

1. Less Turns = Less Token Consumption

Aggressive parallelization of tool calls reduces API round-trips. I prefer a failed tool call to another full API call.

The 3-turn process

Discover → Read → Act. Group independent calls in the same turn to minimize context resends.

Sequential (8 turns)

Turn 1: Glob("**/*.py")

↓ wait

Turn 2: Grep("handler")

↓ wait

Turn 3: GetFolderDescription()

↓ wait

Turn 4: Read(main.py)

↓ wait

Turn 5: Read(config.py)

↓ wait

Turn 6: Edit(main.py)

↓ wait

Turn 7: Write(test.py)

↓ wait

Turn 8: RunTests()

8 turns = 8 × full context resent

DAG parallel (3 turns)

TURN 1 — discover

Glob("**/*.py")

Grep("handler")

GetFolderDescription()

↓ need results to know what to read

TURN 2 — read

Read(main.py, symbol="handle")

Read(config.py, symbol="Config")

↓ need content to write correct code

TURN 3 — act (all at once with depends_on)

WritePlan

Edit(main.py)

Write(test.py)

RunTests()

3 turns = 3 × context → less tokens

2. Less Context = Less Tokens

The context is append-only. Condense early or pay forever.

The naive approach: keep everything

Each turn resends the entire conversation history. File reads, bash outputs, reasoning — nothing is thrown away.

→ API Call 1

sys

← Response: "I'll read main.py" + tool_call(Read)

→ API Call 2

sys

asst

Read(main.py) 400L

← Response: "Found bug at L42" + tool_call(Edit)

→ API Call 3

sys

asst

main.py 400L

asst

edit OK

← Response: "Running tests" + tool_call(RunTests)

→ API Call 4

sys

asst

main.py 400L

asst

edit

asst

pytest 200L

← Response: "all tests pass! Task done."

■ cached prefix ($0.30/MTok) ■ fresh content ($3.00/MTok)

Append-only: previous turns stay in place (cached), new content goes at the end (full price). The 400 lines of main.py are sent 3 times — you only needed 1 function.

With Snippets: same work, smaller context

You still pay to Read once. But instead of carrying 400 lines forever, you save a 20-line Snippet — the savings start next turn. Works with any tool result: Read, Grep, Skills, GetFolderDescription…

→ API Call 1 (same as naive)

← Response: "I'll read main.py" → tool_call(Read)

→ API Call 2 (same as naive — you pay the 400L once)

Read(main.py) 400L FRESH

← Response: finds bug, emits Snippet(L40-60) + Edit(main.py)

→ API Call 3 (HERE the difference — 400L → 20L Snippet)

Snippet 20L

edit OK

← Response: "done, running tests" → tool_call(RunTests)

→ API Call 4 (still small)

20L

edit

pytest 200L

← Response: "all tests pass! Task done."

Same structure [S][U][A][...] — just 20L instead of 400L from Call 3 onward

With Methodology: append-only working memory

Each turn, the model appends a Methodology note (goal, plan, discoveries) to the cached prefix. Old tool_results are destroyed — only Methodology + Snippets + the original user message survive.

→ API Call 1

← Response: tool_call(Read) + Methodology#1

→ API Call 2 (400L paid once)

Meth#1

Snippet

Read(main.py) 400L FRESH

← Response: Methodology#2 +...

Small Context, High Parallelism: How To 10x Reduce Agentic Coding Cost

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs