Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them

gctwnl1 pts0 comments

Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them – R&A IT Strategy & Architecture

Skip to content

Menu

Open a search box<br>Close a search box

Search for:

Search

For reasons that will remain hidden, we resume writing about Generative AI/LLM after a hiatus of 15 months (that one from October 2025, and the one from June 2025, don’t really count as serious pieces). Today, the first of two articles about "coding with Large ‘Language’ Models", as coding with LLMs is positioned as the ‘killer app‘ for LLMs.

We interrupt this program for a short digression on Anthropic’s recently released blog post When AI builds itself.

Did Anthropic perchance hire Google marketeers?

Anthropic’s blog post is a masterclass in suggestive writing. The caveats are there, but hidden or sandwiched between more hyperbolic statements. A sentence ‘we might be wrong’ is there, but what role has it as a single sentence in thousands of words of text assuming that they are not wrong? The benchmarks are suspect (a 50% or even 80% success rate on a coding task compared to a human is effectively completely useless in full-agentic (no human in the loop) coding. Is checking in 8 times as many lines of code per day really a good thing? What if every day you are replacing what wasn’t OK the day before? What if LLMs edit in a way that Lines of Code becomes less trustworthy a measure as it is already? All in all, it reminds me of Google’s deceptive talk about its ‘Willow’ QM computing chip.

By the way, on that "Is checking in 8 times as many lines of code per day really a good thing?": A lot of my LLM-based checkins have been like that, and frankly, I have increased my number of checkins just to be able to backtrack if Claude Code has gotten lost. Even with all the work I did without intermediary checkins, I committed 7 times as much change than I ended up with in terms of lines of code…. So, ‘8’ as an estimate not so much of increased productivity, but of increased overhead does sound about right…

We will return to this if I get around to writing the in-depth of ‘coding with Claude’ later.

And while we’re at it. This is a long and winding piece (I am really having some trouble on this front, apologies), so let’s provide:

TL;DR — It doesn’t look like LLM-coding is going to be affordable<br>(let alone for "AI to build itself")

I have been doing some experimenting. The experiment is: "How good is Claude Code anyway?". That experiment is still running and Claude Code has by now created around 40k lines of code and a working (though incomplete) application. I hope to report on that experience in a short while (but it is a much more difficult write-up). In the meantime, I experienced the cost issue and it led to a short research project, which led to a number of interesting observations and conclusions:

Let’s start with an important observation: Thanks to the combination of Claude Code and my own (rusty, but solid enough) programming background, I have been able to let Claude Code create this application (unfinished as of now, but functional) that I would otherwise not have been able to create in such a short amount of time, which — given time and energy — would have meant: ‘not at all’. For an experienced programmer, the initial experience is extremely impressive as an experienced programmer knows how much understanding by themselves normally goes in to creating such code;

But… LLM-coding isn’t economically viable for most uses. It is viable now because the subscriptions are heavily subsidised. But if you use the $100 a month Claude Max plan, and you would use it to the weekly limit by going full ‘agentic coding’ (so almost no human in the loop) you would use an amount of tokens that would cost you more than $1000 at API-pricing. Anthropic seems to be busy (Opus 4.7, 4.8) to stop that bleeding, and even if that succeeds without a loss of quality, it does signal an end of substantial improvements (i.e. the end of an S-curve);

And… while simple conversations with either budget or frontier models have become indeed ‘too cheap to meter’, the serious uses (like coding, complex reasoning) that require the recursive/indirection/tool-using/’thinking’ (not) models have exploded so much in token use that these uses have become very expensive. A single task by a top recursive model at high effort is estimated to cost around $75 at API-rates. I have seen a single query use one million tokens, which would mean max $25 at API-rates;

So… the economic model that is being presented to the world seems based on the combination of the value of the tasks that require a maximum amount of brute force to approximate good results on anything complex, while hiding the cost or talking about ‘too cheap to meter’;

Hence: enjoy the music for as long as this ship hasn’t sunk, and prepare a good life raft.

Here is a part of a screen shot of the application I am building (and having some fun with). It’s a real application which may support me...

code coding claude anthropic lines good

Related Articles