@adlrocha - The Real Cost of Using AI in 2026
@adlrocha Beyond The Code
SubscribeSign in
@adlrocha - The Real Cost of Using AI in 2026<br>The math says that I shouldn’t be building my own AI rig, and why I am building one anyway<br>adlrocha<br>Jun 28, 2026
Share
A few weeks ago I wrote about the shift from GPU-poor to token-poor. Since this post, and the ones I wrote about my recent obsession with AI independence, a lot of people have asked me for advice about how they should access intelligence: “fine, but what should I actually do? Buy a subscription? Pay per token? Build a rig? Rent one?” I dodged the economics in the token-poor post, so let’s do them properly now.<br>The first thing I did before writing this post, is to pull my own token bill for the last 60 days (which have actually been slower than usual) in order to model my own token consumption (sidenote: BI built a really cool tool for this in nibble, my agent harness, that I can talk about in coming posts if someone is interested).
91% of my token spend went to expensive models I cannot run at home (and that I never will because of their size and them being closed). But I already held a yearly subscription, so why not use those first? The open models I “could” (and let me add quotes here for now) host myself, the Qwens, DeepSeeks, the GLMs and the Kimis, cost me around $30 over two months. Just. Thirty. Dollars.<br>This gap is what actually motivated this exercise and my whole post. What if I didn’t have that Claude subscription? How much would’ve cost me my access to intelligence, and what are the alternatives?
The four ways to buy intelligence
There are exactly four ways (that I could come up with) to get tokens out of a large language model.<br>The first (and probably, the most widely used) is a subscription . You pay a flat monthly fee, and someone else runs the model. This is the ChatGPT Plus, Claude Pro, or Kimi/GLM coding plans. Simple, capped, predictable, and as we’ll see, heavily subsidised.<br>The second is pay-per-token via an API . No flat fee, you pay for exactly what you consume, priced per million tokens in and per million out. This is generally what you use when you need to power your application with AI, or when you route an agent through a serverless LLM provider like OpenRouter, Fireworks, Together, etc.. I’ll let you correct me in the comments, but I would say people leveraging agents on their day-to-day prefer the predictability of subscriptions than paying-per-token.<br>The third is renting a GPU in the cloud . You spin up a machine by the hour, load whatever open-weight model you like, and serve it yourself. RunPod, Vast, Lambda. You’re not paying for tokens, you’re paying for raw compute time. With this you don’t need to think about the amount of tokens you are consuming anymore.<br>The fourth is owning the hardware . You buy the silicon, it sits in your house, and the marginal cost of a token is your electricity bill. As you know, this is the quest I’ve been on for a while now.<br>I really think each of these wins in a different regime. The tricky thing is knowing which regime you’re in.
The numbers, at real usage
For this exercise, I tried to be as objective and practical as possible. Let me put my own usage through all four and show the maths. My pattern is moderate and mixed: coding, writing, research, a few hours a day, heavy on context. I don’t have token-heavy long-running loops, and all my LLM-powered crons are routed through my local Qwen (not considered for this analysis). From all my consumption, there are roughly 78 million input tokens and 13 million output tokens a year on the replaceable tier, the open models I could plausibly self-host. For personal and professional reasons, this month has been slower than usual in my use of tokens, but it allows me to set a good baseline floor of my usage.<br>Here’s what that year costs, four ways:<br>Pay-per-token API: DeepSeek V4 Flash at $0.14 in / $0.28 out comes to about €13 a year . At my current messier mix of open models, call it €130 . Either way, low triple digits at most.
Cloud GPU rental: an MI300X with enough memory to hold a 100GB model runs about $1.99 an hour on RunPod. Realistically, we would need at least 200GB of VRAM to run something of the level of the open-models I use through the API. At ninety hours a month that’s roughly €2,300 a year , plus storage so you don’t re-download the weights every cold start.
Own hardware: a usable DIY server is around €2,900 up front for one GPU, and after that maybe €30 a year in electricity. I’ve been looking to build myself an AMD-based rig with at least 4 GPUs equivalent to the RTX3060 or RTX5090 and that takes you to around €12,500 . A pair of DGX Sparks, the configuration people actually want, is €9,600 . I also love the tinybox red v2, but that’s $12,000 for only 64GB of RAM (and I am not sure about how upgradable it is with a lot of tinkering).
Subscription: whatever your flat fee is, for models the other three options can’t touch...