Token Capital Efficiency | Kevin Madura
Satya Nadella recently published an excellent article on what a future firm looks like in an AI-driven economy. He also introduces the concept of “token capital” which now exists alongside human capital (and financial capital).
A natural extension is token capital efficiency, which can be defined as the business value an organization captures per dollar invested in tokens; i.e., value generated divided by the volume of tokens consumed times their price, across reasoning, task execution, and learning. Higher efficiency comes from extracting more value per token, consuming fewer tokens per outcome, or sourcing tokens more cheaply. This is directly enabled by a new motion for firms, namely, how well an organization can represent valuable knowledge work as tokens an LLM can process reliably.
Almost no firm today is token capital efficient. Everyone is figuring it out on the fly, often to the detriment of technology budgets.
Everyone blindly defaults to the latest model, and now the bill is coming due.
In about eighteen months we have round-tripped from tokenmaxxing to a token spend backlash. CFOs and boards with surprise bills are starting to ask questions. At the center is a core tension between companies rushing to “do AI”—whatever that may mean—and the need for financial responsibility. The usage patterns of this technology are different from other enterprise software in that it is simultaneously ubiquitous and often billed on a usage basis. That, coupled with the speed of advancements, means that everyone automatically defaults to the best model for everything, hoping to get the best performance possible regardless of task.
Most organizations are pushing every user, regardless of technical sophistication, to use AI as much as possible. That’s fine; 99% of users shouldn’t have to know the capability difference between an Opus-class and a Haiku-class model, but at enterprise scale there is a meaningful difference. But the directive of “use AI as much as possible” with no boundary or governance is exactly how you get ballooning bills with an unclear return profile. This approach also suffers from variable outcomes, because often people are writing two-sentence prompts and hoping for the best.
We’re at the point where models are getting so good that there’s an emerging bifurcation in requirements for frontier vs “commoditized” AI usage. Frontier capability is useful for exploring true unknowns, for planning complex activities, and more advanced reasoning. For more common, well-defined tasks, frontier models are likely overkill. This article covers what an approach could look like for structured, well-understood tasks.
The most obvious way to make an impact is to match task complexity with model capability. But to do so, the tasks themselves need to be well understood.
By taking the time to define tasks that are meaningful, you can dramatically improve your token capital efficiency (that is, simultaneously reduce cost and improve outcomes).
Picture every way we get a computer to do something as a single spectrum, running from fully deterministic to fully probabilistic. On the far left is the ordinary computer program we’ve always written: formulaic, deterministic and measurable by construction. As you move right, you trade determinism for flexibility, ceding more of the how to the model—first as a spec, then a workflow, then a “nudge”—until on the far right you reach a raw LLM prompt: maximum flexibility, minimum guarantees. The crucial thing here is that the what never disappears. You always have an intent; that is, what it is you want to achieve. It’s only the specification of the how that fades out as you move to the right.
Most enterprise users and tokenmaxxers live on the right: defer everything to the model. That’s a reasonable place to be for certain work. Coding agents fit this well, for instance, because a mature codebase gives the model something to bump up against in the form of tests. A failing test is a boundary. Most knowledge work today has no such boundary, at least not ones that are digitally codified as a test, and this is the source of variable outcomes and associated frustration.
But there are many tasks a knowledge worker does that can have well-defined boundaries such that they can move left on this chart and be much more token capital efficient. Doing this well comes down to a sequence: define the task, match a model to it, measure the result, then optimize.
Decomposing complex processes into discrete tasks reduces variance.
An effective, discrete task is generally a well-defined set of inputs, which may include certain criteria or process steps, and a desired set of outputs, such that you can measure the acceptability of the output.
For example, say I want to examine an invoice and extract a few key details about particular line items in an output that I can put into a database and work with programmatically. I can give a human a...