My AI coding flow was burning tokens to do things code should do

select * from random_thoughts

SubscribeSign in

My AI coding flow was burning tokens to do things code should do How Pi Agent and a few deterministic extensions fixed my flow

Geert Theys May 25, 2026

Most AI coding flows are getting more elaborate. Mine got simpler. If you spend most of your day in a coding agent and have started wondering whether some of it should just be a script, this is for you. I am not going to cover MCP servers, skill authoring, or how Pi compares feature by feature to Claude Code. Plenty of other posts do that. Credit where it is due: Robert Douglass built Spec Kitty and it is a good tool. It sits on top of a coding harness, follows Spec-Driven Development, and gives you governance and auditability. It will work for plenty of teams. I went the other way. I started with an open source harness called opencode. I learned how it worked, and pretty quickly it started to feel restrictive. Then I found Pi Agent. It is barebones: a read tool, a write tool, an edit tool, bash, and a way to connect your models. Everything else lives in extensions, and you extend it yourself. There is a prompt folder and a skills folder. No agents. That is the whole point. (If you want a longer take on why Pi’s minimalism is interesting, Armin Ronacher wrote one.) When I was on opencode I leaned on commands and agents to run deterministic steps. The problem is that sometimes the LLM followed them and sometimes it did not, and either way it was burning tokens to do work that did not need a model in the loop. That is my main gripe with where a lot of these flows are heading: we are giving up determinism in the exact places it matters most. Some things should just be code. Determinism

I am slowly moving away from pure command/agent/skill flows toward deterministic building blocks the LLM can call in a modular way. The result is more uniform delivery steps, and I no longer have to babysit every session to make sure the model is not running npm when AGENTS.md clearly says yarn, or skipping the commit step, or running the wrong test command. A concrete example. At our company, GitHub Actions runs a SonarQube check for code coverage and security issues. I want those results fed to the LLM so it can propose fixes. My old approach was a command that told the LLM to pull the report and write it up. It worked, mostly. It also burned tokens and sometimes went off script, which meant I had to verify every step. So I rewrote it as a Pi extension that just runs. Same story for code review. My codereview extension handles PRs, commits, folders, or uncommitted code. The prompt is built deterministically from whichever target you pick. Before, I would tell the LLM which commands to run and hope. Now it goes right every time.

The rifle creed in my repo is there for the same reason. It reminds me that the tools I trust most are the ones I built and understand. The best way to think about any of this is as dotfiles. Do not copy someone else’s dotfiles wholesale, they are theirs. Browse them for ideas, take what fits, build your own. Same with the Pi Agent extensions. This is my flow. Use it for inspiration, copy what works, but build your own. Management

Spec Kitty has a nice setup for managing your flow. So does every other tool out there, each in its own ecosystem. I work across many projects with different issue trackers, so I fall back on open source that already solved the problem. Taskwarrior is my local representation. Bugwarrior syncs most of the known issue trackers into it. Hook it up, sync, done. Sometimes you do not need another tool. You just need to use the one that exists. Token costs

I saved enough tokens that I fell off our internal burn leaderboard, which thankfully is not a metric for LLM adoption at our company. Three places are worth looking at. Chatting. Every message sends the system prompt, your skills, your MCPs, and the agent prompt along with the actual message. Caching helps, but caching is not free either. This is where caveman comes in, an excellent skill built by a Dutch student. I have been running it for three months and recommend it. Tools. I used to use RTK-AI, but the LLM would sometimes complain that commands were truncated and things broke. RTK-AI lets you exclude commands in config to work around it, but I switched to condensed-milk and have had fewer issues overall. Compaction. I always thought it was a bit silly to use an LLM to compact a prompt, since you are burning tokens to save tokens. VCC does this locally. It does not summarize, it actually compacts the message through normalization and references certain blocks for retrieval if needed (via vcc-recall). No extra model calls, no lost context. One last thing

None of this is a framework. It is a set of choices that fit how I work, written down so I can keep making them on purpose. If something here is useful to you, take it. If not, the better...

My AI coding flow was burning tokens to do things code should do

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits