Small Context, High Parallelism: How To 10x Reduce Agentic Coding Cost

flinch_again1 pts0 comments

Reducing total token consumption of agentic coding

Reducing Total Token Consumption<br>of Agentic Coding

TL;DR β€” Two levers reduce cost:

1. Less turns (parallel tool calls β†’ fewer API round-trips)

2. Less context (methodology + snippets β†’ only relevant info survives)

πŸ“Š Understanding the problem: why tokens grow quadratically

What you see vs. what is sent

πŸ‘€ User side<br>0 chars

πŸ€– Sent to the API<br>0 chars

β–Ά Next message<br>⏩ Auto<br>β†Ί Reset

The bottom line after 10 exchanges

Characters typed by the user

Characters sent to the API (cumulative)

Multiplication factor

O(nΒ²)

Cumulative cost complexity

Each new message resends the entire history. The total cost grows like the sum 1+2+3+…+n = n(n+1)/2 .

Interactive simulation

Messages (n):

10

Avg size per message (s):

2k

Tool loops per turn (k):

Classic chatbot vs. agentic coding

πŸ’¬ Classic chatbot

User: "Fix this bug"β†’ 200 chars

API response #1system + msg = 4,200 chars

User: "Also add tests"β†’ 150 chars

API response #2system + history = 8,500 chars

User: "Rename the var"β†’ 100 chars

API response #3system + history = 12,800 chars

πŸ€– Agentic coding

User: "Fix this bug"β†’ 200 chars

API call #1system + msg = 4,200 chars

β†’ Read(main.py)tool result: 3,000 chars

API call #2everything + tool = 7,400 chars

β†’ Edit(main.py)tool result: 500 chars

API call #3everything + tools = 8,100 chars

β†’ RunTests()tool result: 2,000 chars

API call #4everything + tools = 10,300 chars

βœ“ Result returned to user1 user message β†’ 4 API calls

β–Ά Animate<br>β†Ί Reset

1 user message = 3-15 API calls

Each tool call (Read, Edit, Bash…) triggers a new API call with the full accumulated context.

Agentic cost: the hidden multiplier

Classic chatbot

n Γ— s β†’ O(nΒ²)

10 msgs Γ— 2k = ~110k chars total

Agentic (k loops/turn)

n Γ— k Γ— s β†’ O(nΒ² Γ— k)

10 msgs Γ— 5 loops Γ— 2k = ~550k chars

With agentic coding, the context grows k times faster . A 20-message session with 5 tool loops per turn sends millions of characters to the API.

1. Less Turns = Less Token Consumption

Aggressive parallelization of tool calls reduces API round-trips. I prefer a failed tool call to another full API call.

The 3-turn process

Discover β†’ Read β†’ Act. Group independent calls in the same turn to minimize context resends.

Sequential (8 turns)

Turn 1: Glob("**/*.py")

↓ wait

Turn 2: Grep("handler")

↓ wait

Turn 3: GetFolderDescription()

↓ wait

Turn 4: Read(main.py)

↓ wait

Turn 5: Read(config.py)

↓ wait

Turn 6: Edit(main.py)

↓ wait

Turn 7: Write(test.py)

↓ wait

Turn 8: RunTests()

8 turns = 8 Γ— full context resent

DAG parallel (3 turns)

TURN 1 β€” discover

Glob("**/*.py")

Grep("handler")

GetFolderDescription()

↓ need results to know what to read

TURN 2 β€” read

Read(main.py, symbol="handle")

Read(config.py, symbol="Config")

↓ need content to write correct code

TURN 3 β€” act (all at once with depends_on)

WritePlan

Edit(main.py)

Write(test.py)

RunTests()

3 turns = 3 Γ— context β†’ less tokens

2. Less Context = Less Tokens

The context is append-only. Condense early or pay forever.

The naive approach: keep everything

Each turn resends the entire conversation history. File reads, bash outputs, reasoning β€” nothing is thrown away.

β†’ API Call 1

sys

← Response: "I'll read main.py" + tool_call(Read)

β†’ API Call 2

sys

asst

Read(main.py) 400L

← Response: "Found bug at L42" + tool_call(Edit)

β†’ API Call 3

sys

asst

main.py 400L

asst

edit OK

← Response: "Running tests" + tool_call(RunTests)

β†’ API Call 4

sys

asst

main.py 400L

asst

edit

asst

pytest 200L

← Response: "all tests pass! Task done."

β–  cached prefix ($0.30/MTok)<br>β–  fresh content ($3.00/MTok)

Append-only: previous turns stay in place (cached), new content goes at the end (full price). The 400 lines of main.py are sent 3 times β€” you only needed 1 function.

With Snippets: same work, smaller context

You still pay to Read once. But instead of carrying 400 lines forever, you save a 20-line Snippet β€” the savings start next turn. Works with any tool result: Read, Grep, Skills, GetFolderDescription…

β†’ API Call 1 (same as naive)

← Response: "I'll read main.py" β†’ tool_call(Read)

β†’ API Call 2 (same as naive β€” you pay the 400L once)

Read(main.py) 400L FRESH

← Response: finds bug, emits Snippet(L40-60) + Edit(main.py)

β†’ API Call 3 (HERE the difference β€” 400L β†’ 20L Snippet)

Snippet 20L

edit OK

← Response: "done, running tests" β†’ tool_call(RunTests)

β†’ API Call 4 (still small)

20L

edit

pytest 200L

← Response: "all tests pass! Task done."

Same structure [S][U][A][...] β€” just 20L instead of 400L from Call 3 onward

With Methodology: append-only working memory

Each turn, the model appends a Methodology note (goal, plan, discoveries) to the cached prefix. Old tool_results are destroyed β€” only Methodology + Snippets + the original user message survive.

β†’ API Call 1

← Response: tool_call(Read) + Methodology#1

β†’ API Call 2 (400L paid once)

Meth#1

Snippet

Read(main.py) 400L FRESH

← Response: Methodology#2 +...

turn call read chars main response

Related Articles