I built a small tool to reduce input token costs by 20-30% for agentic tasks

I left Cody. Here's how I made my BYO-API-key AI setup not suck on big repos. — Big Indexer

← Back to all posts

I left Cody. Here's how I made my BYO-API-key AI setup not suck on big repos.

Walkthrough 2026 · 8 min read

A walkthrough for the people scrolling through r/ClaudeAI, r/LocalLLaMA, and the Continue Discord asking "what's a good Cody alternative now that AMP charges per line?"

If you're reading this, you probably already know the story. Sourcegraph quietly retired Cody and replaced it with AMP, which charges per line of code generated. Cursor caps your usage at $20. Copilot's pricing is opaque. The whole "flat-rate AI coding assistant" market just admitted that the economics don't work at current LLM token prices.

Meanwhile, the BYO-API-key crowd — Continue, Aider, Claude Code with the claude CLI, Cline — is sitting on a $5–20/month Anthropic or OpenAI bill and getting nearly the same value, if they can get the AI to actually understand their codebase. That last part is where it falls apart on big repos. The AI reads files randomly. It misses the architecture. It suggests changes that cross boundaries it shouldn't.

This is a write-up of what's actually been working for me to close that gap, with concrete numbers from a 100-run study and reproducible code. I'm the maintainer of Big Indexer, so I'm not exactly unbiased — but the validation data is public, the loss cases are documented, and you can run the whole thing locally on your own repo in 5 minutes.

The actual problem with BYO-API-key setups on big repos

Continue, Aider, and friends do retrieval. They embed your codebase, pull a few chunks per query, send those to the AI. That works fine for "what does this function do" but it falls over on architectural questions:

"Where should I add this feature without leaking responsibilities across modules?"

"What's the blast radius if I change this route handler?"

"Are there other places in the repo that solve a similar problem I should mirror?"

Embedding-based retrieval doesn't have a model of architecture. It has a model of textual similarity. Those aren't the same thing. The result: AI gives you syntactically reasonable code that crosses the wrong boundaries, gets the wrong abstractions, and reads ten files when it should have read two.

Sourcegraph knew this — that's why Cody had a "structural" code graph backing it. But it was a managed service, your queries went to their cloud, and the per-line economics became untenable.

What I tried instead

A small open-source tool called Big Indexer that does one thing: scan your repo, build a behavioral graph, expose it over MCP (Model Context Protocol). MCP is the standard your AI assistant probably already speaks — Continue, Cursor, Claude Desktop, Cline all support it.

The pitch: BGI is not an AI coding assistant. It runs alongside whichever one you've picked. It's the architecture-aware context layer that gives the AI a real model of your codebase's structure, so it stops file-fishing and starts giving you architecturally sane suggestions.

Cost: free, local, Apache 2.0. No service in the loop. No per-token fee.

Setup, in 5 minutes

pip install bigindexer cd /path/to/your/big/repo bgi scan . --lang auto --out bgi-graph.json --fuse-graph fuse-graph.json

Then add it to whichever MCP client you're using. For Continue, that's an entry in ~/.continue/config.json:

"experimental": { "modelContextProtocolServers": [{ "name": "bgi", "transport": { "type": "stdio", "command": "bgi", "args": ["mcp", "--graph", "/abs/path/bgi-graph.json", "--fuse-graph", "/abs/path/fuse-graph.json" }]

Restart Continue, and it has 12 new MCP tools available. The three I actually use:

task_fingerprint(task) — convert a natural-language task into the repo's behavioral vocabulary

behavioral_twins(task) — find the top 3 in-repo code units already doing similar work

twin_context(task) — combined: fingerprint + twins + the seam where they connect + a 5-point safety rubric

The full setup walkthrough with more detail (Claude Desktop, Cursor, Cline configs too) is in docs/MCP_WITH_CONTINUE.md.

What actually changes

The validation page has 100 scored runs across 5 production OSS repos (django, fastapi, prometheus, pydantic-core, next.js) with three different models (deepseek-v4-flash, GPT-4o, Gemini auto). Raw outputs are committed; you can re-score yourself.

Headline numbers:

MetricNo BGIWith BGI MCP context

Median agent latency133.8s66.2s Boundary accuracy0.951.00 Actionability (1–5)4.004.75 Hallucination rate00

The latency drop is the cost story. Median agent run halved. That's because the AI read fewer files — BGI told it which ones mattered. Translates to roughly 20–30% lower input-token cost on big-repo agentic tasks (depends entirely on your starting setup; don't believe specific cost claims without measuring on your own repo).

The boundary and actionability numbers are the answer-quality story. Both moved up. That's the part that matters more — a wrong...

I built a small tool to reduce input token costs by 20-30% for agentic tasks

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast