A cheaper and safer agentic AI workflow

A cheaper and safer agentic AI workflow - danuker | freedom & tech

Toggle navigation

danuker | freedom & tech

Contact/About

Privacy statement

Quotes

English

Română

I recently tried agentic coding for real. It cost $0.034 and finished in 3 minutes. It made two mistakes. In my personal human attempt, I took an hour, and made four mistakes.

Cheaper model services

I heard about GLM-5.2, and a lot of benchmarks are saying it's on par with the leading proprietary AIs of just 3 months ago. On the same benchmark site I had discovered GMI Cloud, a model service.

I created an account and received $5 in free credits last year. I see the minimum deposit is $10 nowadays. That's fine for me too.

I create an API key on their service.

I am not too keen on giving a Singaporean model hosted by a US company on data centers scattered throughout the world access to my private data. So I installed Debian in a VirtualBox image, and installed pi and the Guest Additions on it. Then I shared a copy of my project as a Shared Folder. Nothing else.

I configured pi and unleashed GLM-5.2 on the folder. 5 minutes and $0.435 later, the agentic sanity test worked. I asked it to look through various data files of various formats and create an index.tsv with information of interest. It did a perfect job.

Optimizing even further

So did Qwen3.6-35B-A3B-Q4_K_XL from Unsloth on my CPU, but it took more than an hour (and my time and interactivity is worth way more than $0.435 per hour). But how cheap could I go? Looking at what else GMI has to offer, DeepSeek V4 Flash catches my eye. It looks like it's a tiny bit more verbose than GLM-5.2, so the same number of tokens per task, but less than a 10th of the cost. Can it still perform my task?

I replace zai-org/GLM-5.2-FP8 with deepseek-ai/DeepSeek-V4-Flash and rerun the test.

Done in 3 minutes and $0.034. It shows a tiny bit of imperfection: it made 2 mistakes. Some irregular data series are shown as "daily" though they've got 5-ish-day and 2-ish-day periods. But other than that it's fine. I also noticed deepseek-ai/DeepSeek-V4-Pro, which is somewhere in the middle. Zero mistakes on my test, but took 2 mins 27s and $0.229. I think this is the one I will keep instead of GLM, but I will mostly use V4-Flash.

My ~/.pi/agent/models.json ended up like so:

"providers": { "ollama": { "baseUrl": "https://api.gmi-serving.com/v1", "api": "openai-completions", "apiKey": "Almost free but not free. Very, very cheap.", "compat": { "supportsDeveloperRole": false, "supportsReasoningEffort": false }, "models": [ "id": "deepseek-ai/DeepSeek-V4-Flash", "reasoning": true, "contextWindow": 262144 }, "id": "deepseek-ai/DeepSeek-V4-Pro", "reasoning": true, "contextWindow": 262144

Especially considering that I made 4 mistakes, and that it took me a bit more than an hour. Curse the mm/dd/yyyy format! It seems I have been thoroughly bested at that task. I feel like adjusting my career path and keeping up with the times.

Bonus: Go even cheaper: Every so often, my models stumbles into a huge one-line JSON, and runs up the token count filling up pi's 50KB DEFAULT_MAX_BYTES limit. I changed that limit to 5KB, significantly reducing input token count. There is a ticket to introduce this as a setting, but it was auto-closed. The files to modify (with the pi version as of writing this) are:

~/.local/share/pi-node/node-v22.22.3-linux-x64/lib/node_modules/@earendil-works/pi-coding-agent/node_modules/@earendil-works/pi-agent-core/dist/harness/utils/truncate.js ~/.local/share/pi-node/node-v22.22.3-linux-x64/lib/node_modules/@earendil-works/pi-coding-agent/dist/core/tools/truncate.js

I modified both (not sure if I needed to). Prompt tokens for DeepSeek-V4-Flash went from 604k to 431k, and total cost went from $0.034 to $0.026 for my particular test.

The future

My work now changed significantly. No longer do I manually copy paste tiny code segments, instead I ask the agent what to do, then compare the agent's shared directory with the main one. I do this with PyCharm for its good diff directory interface, but you can do it with Meld as well.

So there you have it. I am reaping the AI rewards, while refusing to give in to vendor lock-in. I despise closed ecosystems and enshittification. When Anthropic started pushing for Claude Code exclusivity, I found that anticompetitive. Also, arbitrary and sudden price increases are reckless while open weights models are just a few months behind. They are desperately trying to extract value from rapidly devaluing models. If the breakneck pace slows down, their value evaporates. The moat they are trying to build is actually slowing themselves down.

The hardware is where it's at. Companies like GMI and DeepInfra provide flexible value. And the hardware depreciates in at least a few years, rather than 3 months. I don't know how to monetize models sustainably, however. Maybe crowdfunding? Non-profits? Public utilities like firefighters?

TL;DR / summary

Get a model API —...

A cheaper and safer agentic AI workflow

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org