Is this what driving an F1 car feels like?

Is this what driving an F1 car feels like? | Cameron Westland

May 20, 2026

My 30-day token bill is $11,232.54. Fourteen billion tokens, almost all of it GPT-5.5 on a ChatGPT Pro 20x plan. Yesterday alone I put $1,157.78 and 1.4 billion tokens through the harness. Today, mid-afternoon, I’m already at $226 and 270M.

That’s nothing next to Peter Steinberger: $1.3M and 603 billion tokens against the OpenAI API in 30 days. Peter noted in replies that disabling fast mode would cut that ~70%, so the comparable number is closer to $390k. Either way, a different category: Peter has described his stack publicly, running ~100 Codex instances in the cloud doing PR review, security scans, issue de-duplication, performance benchmarks, meeting listeners that auto-start PRs. His framing question: “how would we build software in the future if tokens don’t matter?” Mine is $11k at standard rates, no fast mode, exclusively me at the keyboard. Interactive development, one driver, fifteen sessions in flight. Every thousand of that bill correlates one-to-one with shipped output: PRs merged, features in production, cycle times I can measure.

There is a new performance class in software engineering, and it is already pulling apart into something more like F1 than the old IC ladder. The skill isn’t prompting. The skill is driving 10–15 concurrent agents across multiple projects without crashing any of them. I think I’m in that class, and I’m a little afraid of what that means.

What this actually looks like

On a normal day I’m running five parallel tracks per project, across three projects. Fifteen agents in flight, give or take. Each one has its own code review loop, its own test harness.

That’s ten on one project. Add the other two repos and I’m in the 10–15 range. The meta is constant, per thread:

Is the codebase shaped right for what this agent is doing?

Does it have enough context?

Does it have the right feedback mechanism (tests, API keys, dev creds, something real to validate against)?

Where’s the iteration loop slow, and can I shorten it?

What did I learn here that I should fold into the next session?

As you turn up the concurrency, the probability that something is going wrong at any given moment converges to one. There’s always something to check, always feedback to give.

The closest thing I’ve felt to this before was running a product org as it scaled from under ten to fifty to seventy-five people. Past about fifty, there was a human crisis somewhere every single day. My calendar was solid. I’m now feeling the same nervous-system load with 10–15 agents instead of 75 humans. I’m getting better at it. The models and the harness are too. That fusion is the cyborg part.

The new grading rubric

When I was a software architect at Autodesk (the role they’d now call distinguished engineer), there were a handful of real 10x developers on the graphics and systems teams. I watched specific people, more than once, do more in a day than the rest of us did in a quarter. I’d sit there in awe.

This isn’t another 10x story. Those guys were graded on raw output. The new class is graded on different dimensions:

How many agents can you effectively manage at once?

Are the outputs usable, or just demos?

Will the people who depend on you accept what you ship?

Are you thinking about ops, on-call, regressions, or just throwing prototypes over the wall?

Will you put your name on the bug fixes for the next year and stake your reputation on it?

Operator-class, not IC-class. Exceptional driver in exceptional machine, neither alone is enough.

The team behind the driver matters too. The engine team is whoever’s training the model. The controls team is broader than the harness vendors (Codex, Claude Code, Cursor): it also includes everything my company has built around them. Fast CI. API-accessible logs (Vercel). Metrics (Grafana). Databases (Supabase). The internal skills and scripts that let agents iterate without me holding their hands. When Phil ships a unified search across our GCP logs, he’s the trackside tire engineer delivering an upgrade between sessions. I’m the driver, doing sim runs and leveling up. Not solo work, even if the seat is single-occupancy.

The spec ops frame

Dan Shapiro maps AI-assisted programming onto a driving-automation ladder. Level 0 is spicy autocomplete. Level 2 is pair programming where you review every line (Shapiro thinks 90% of “AI-native” devs are stuck there). Level 4 is engineering manager for a team of agents. Level 5 is the dark factory: nobody reads the code, ever.

The cleanest Level 5 example is StrongDM’s AI Factory: “code must not be written by humans, code must not be reviewed by humans.” They built a digital-twin universe of cloned Okta, Jira, and Slack so...

Is this what driving an F1 car feels like?

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play