AI Capability Theater: Why Everyone’s Driving a Ferrari | by Tim O'Brien | Jun, 2026 | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
AI Capability Theater: Why Everyone’s Driving a Ferrari
Would Greenspan call this Irrational Intelligence?
Tim O'Brien
8 min read·<br>Just now
Listen
Share
Drive to work tomorrow and look around. You’re not surrounded by Ferraris. You’re surrounded by Subarus, Fords, and Toyotas — cars that get the job done, reliably, at a price that makes sense for what they’re asked to do. Most people have never been in a Ferrari. They exist, they’re fast, they’re absurdly expensive, and almost nobody needs one. For getting to work, I’m going to bet that you don’t drive a Lotus or a Ferrari.<br>I do see a few Ferraris. There’s a Ferrari dealership a few miles from my house. Every time I pass it, I think about what those half-million-dollar machines are actually being used for when I see them: going nowhere, slowly, in the same interminable traffic jam on I-405. A machine built to do 200 miles an hour, idling in a sea of brake lights, waiting to merge.<br>Most people reading this would agree that using a Ferrari for your daily commute is a waste of money and fuel.<br>Why did you just draft that email with a Ferrari?<br>But some of you probably just asked Claude Opus 4.8 or Sonnet 4.6 to revise a sentence or a two-paragraph email, and the infrastructure behind your multi-GPU grammar edit makes a Ferrari look cheap. That rack of GPUs that you were using to test out Fable 5 last week? It costs more than a Ferrari by several orders of magnitude, and I just know it. I know a lot of people who tend to just stay on one high-powered model all day.<br>Most people don’t even think about it, and they just leave the system on Opus or GPT-5.5 all day. It’s too much work to stop and think about each specific task.<br>We’re in the middle of a subsidized AI space race, and everyone’s happy to climb into the most powerful model available because the cost of falling behind feels too high. Until more people understand what these models actually cost — in money and energy — we’re not going to have a serious conversation about any of it.<br>Press enter or click to view image in full size
Capability theater, parking lot edition. (Image Assist from ChatGPT)The Ferrari problem: Why you don’t need Opus<br>Fix a grammar problem, and you can use Haiku — a fraction of a second on a fraction of an compute instance. Or you can send that same sentence to multi-agent agentic process running on Fable 5, which ties up most of a rack for a minute or two. Same mundane result. Costs and energy consumption that differ by orders of magnitude.<br>The numbers are concrete: editing a paragraph with GPT -5 mini costs about $0.002. Send the same prompt to gpt-5.5 with a fat context window, and you’re at $0.35. That gap compounds fast — a day of using the wrong model can run hundreds of dollars.<br>Most people never see it because they don’t pay for it directly at work. They open their AI tool, type a prompt, and get an answer. The model that responds is whatever the tool defaults to — usually the most expensive one available. The tool company wants you on the flagship because it makes the product feel capable. The model provider wants you there because that’s where the margin is. Nobody in this chain has a reason to steer you toward the cheaper option.<br>That’s especially true for model providers. Use more power than you need, and it’s going to should up as more evidence that everyone needs their latest, most expensive model. “Please, burn more tokens.”
I’ve been experimenting with GLM-5, mid-tier OpenAI models, and Gemini’s smaller offerings. I still reach for Opus or GPT-5.5 a couple of times a day — not because I’ve evaluated the task and decided I need them, but because I’m also guilty of wanting to drive the Ferrari for some of these tasks. The task seems too complex or too important, and the instinct is to throw the biggest model at it even if I could turn it into a more intelligent, agentic, multi-step process — sometimes I’ll just dial it up to Opus and let it go.<br>And that instinct is usually wrong. I forced myself to use the cheaper models for an entire week. The frontier model wins at the edges — handles the hard case more gracefully, needs less cleanup. But most of the time, the cheaper model does the job. The capability gap is smaller than you might think.<br>When you’re using a frontier model to fix a comma or summarize a meeting, you’re not doing better work. That’s capability theater — and almost nobody is calling it what it is. They’re going on feeling. And the feeling is shaped by marketing, not measurement.<br>There’s another odd thing happening here: a sort of confirmation bias. As a user, you dial it up to Opus or Fable and expect better results. I can’t tell you how many people have commented that Fable was great last week. Was it? No one had the ability to use it long enough to get experience with...