CodeGraph on Hono: the tool-call savings reproduce, the cost savings don't

I Tested CodeGraph on Hono. The Tool-Call Savings Reproduce — the Cost Savings Don't. | HarrisonSecI Tested CodeGraph on Hono. The Tool-Call Savings Reproduce — the Cost Savings Don't. Independent CodeGraph benchmark on Hono (~280 TS files): -55% tool calls reproduces the published claim, but cost is a wash (+7%), not -35%. Raw CSV included.

June 1, 2026

Harrison Guo

19 min read

AI Agent Production Engineering Tool Evaluations Table of Contents

Two weeks ago CodeGraph hit GitHub trending — tree-sitter + SQLite/FTS5 + MCP for Claude Code, 19k+ stars in a week. The team published a benchmark on 7 repos showing 35% cheaper, 57% fewer tokens, 46% faster, 71% fewer tool calls vs. baseline. Those are big numbers. They’re also numbers from a benchmark designed by the team that built the tool, on repos they chose. Designer bias is the #1 risk in any retrieval benchmark — when you pick the test repos and write the ground truth, you’ll consciously or unconsciously favor your own tool’s strengths. So I ran an independent test on an 8th repo — Hono (TypeScript, ~280 source files, in neither CodeGraph’s published 7-repo suite nor any other published benchmark I could find). 5 architectural questions covering different retrieval shapes, with a deliberate control case (Q5) where the tool should not win. Two conditions (baseline grep+Read+Glob+Explore vs. CodeGraph active), 4 repeats per question per condition. 40 runs on Claude Opus 4.8 — and, critically, every CodeGraph run was verified to have connected, and actual codegraph_* tool usage was recorded per run (more on why that sentence exists below). The result splits in a way the single published headline number hides — and the split is the useful part. tl;dr — On Hono, CodeGraph delivers a large, consistent reduction in tool calls (-55%, 14.0 → 6.3 avg) and a smaller latency win (-20%) — the published 7-repo direction reproduces here. But cost is a wash: +6.8% , not the published −35%. On narrow-scope questions (route lookup, middleware trace) CodeGraph is actually 20-43% more expensive , because each structural lookup loads a big chunk of graph context that costs more in cached tokens than the grep round-trips it replaces. The cost win only appears on broad multi-file navigation (Q3 multi-runtime adapters: −29% cost, −80% tool calls, −53% latency ). A second finding: baseline grep+Read has high variance — the agent occasionally spiraled to 47-52 tool calls on the broad questions, while CodeGraph never exceeded 16. Net at Hono’s size: CodeGraph makes the agent take fewer steps and finish faster, but not for fewer dollars. Total cost of the 40 valid runs: ~$14 of Opus 4.8 calls. Raw per-run CSV and the 5 verbatim prompts are below.

What “tool calls down, cost flat” actually means CodeGraph’s published 7-repo suite (VS Code, Excalidraw, Django, Tokio, OkHttp, Gin, Alamofire) skews larger and more architecturally complex than Hono. Hono is ~280 TypeScript source files (362 files indexed by CodeGraph, including tests and configs), 16MB on disk — small enough that a thoughtful agent with grep + Read can finish most architectural questions in a handful of tool calls. The interesting result is that the axes come apart. CodeGraph replaces several grep+Read round-trips with one or two structural lookups — so step count drops hard (-55%) . But each codegraph_context / codegraph_explore call returns a sizeable chunk of graph context, which then rides along in the conversation cache and gets re-read every turn. At Hono’s size, the dollar cost of carrying that cached payload roughly equals the dollar cost of the grep round-trips it replaced — so dollars stay flat (+7%) even as steps fall by more than half . That’s not a contradiction of the cost-curve thesis from the prior post in this mini-series — it’s a sharper reading of it. Hono sits above the step-count crossover (the index already saves tool calls) but below the dollar crossover (it doesn’t yet save money). On a much bigger repo, the grep path churns through far more files and the index pays back on dollars too. Hono just happens to land in the gap between the two crossovers. A useful complementary benchmark answers three things the published one doesn’t: Cross-validation on a repo not chosen by the tool’s team — do the published advantages generalize? Within-repo variance across question types — does the win concentrate on certain question shapes? (It does — heavily.) A control case where the tool shouldn’t win — Q5 (text search) tests whether the agent correctly declines to use the structural engine when grep is the right tool. Setup — install CodeGraph, ~10 minutes Reading this? Get the next one in your inbox.

Subscribe # install (downloads a single binary, no Node/npm required) curl -fsSL https://raw.githubusercontent.com/colbymchenry/codegraph/main/install.sh | sh

# clone the test repo + index it git clone...

CodeGraph on Hono: the tool-call savings reproduce, the cost savings don't

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy