FinOps for AI: How We Get AI Costs Under Control - forwardnow GmbH
Before you continue to forwardnow<br>We would like to use third-party cookies and scripts to improve the functionality of this website, including Formspree, Google Analytics, and Hotjar.
Approve<br>Deny<br>More info
Services
Blog
About
Contact
Handbuch
Reference
Deutsch
Services
Blog
About
Contact
Handbuch
Reference
FinOps for AI: How We Get AI Costs Under Control
Why tokens are becoming the new unit of cost, and how transparency, clear limits, and empowered teams keep it under control
Request now
A new cost dimension with familiar patterns
One example caused a stir across the industry: an AI consultant told the news outlet Axios that one of their clients burned through roughly half a billion dollars in a single month because no usage limits had been set on the employees’ AI licenses. The case sounds extreme, and at that scale it is the exception. The underlying pattern is not. On a smaller scale, many organizations are experiencing it right now: a bill jumps from a few hundred to several thousand euros a month, without any alarm going off in the system or any way to tell which service or which user caused the increase.
Anyone familiar with FinOps from the cloud world recognizes the problem immediately. It is not the individual expensive token that drives the cost, but the lack of visibility and the absence of clear boundaries. This is exactly where FinOps comes in: it connects financial governance with operational transparency, so that business units, IT, and finance can make decisions on a shared data basis. The same logic applies to AI. Only the unit of cost has changed.
The good news up front: there is now an open standard for this transparency. Just as logs, metrics, and traces became established in classic monitoring, the OpenTelemetry GenAI Semantic Conventions provide a uniform vocabulary to capture AI consumption in a vendor-neutral way and attribute it to individual sources. This standard is the common thread running through the sections that follow, from attribution through architecture to ongoing controlling.
In short, FinOps for AI rests on five levers: first, transparency through token attribution, meaning the assignment of every model call to user, team, and feature; second, an AI proxy as a central control point for all AI traffic; third, clear limits and guardrails that prevent uncontrolled costs; fourth, continuous optimization through the right model choice, lean calls, and caching; and fifth, empowered teams that understand and own their costs. The following sections walk through these levers one by one.
Tokens are money: AI does not bill like classic software
Classic software is mostly billed per license or per seat. Costs are predictable and rarely change between two billing periods, and procurement runs through a clearly defined purchasing process. AI behaves differently. Here the unit of cost is the token, and costs arise anew with every single call, depending on the length of the input, the length of the response, and the chosen model.
A parallel to the cloud is decisive here, and many underestimate it: the purchasing decision is democratized within the organization. Just as the cloud shifted the procurement of infrastructure out of purchasing and into the hands of engineering, AI distributes spending authority even more finely. No longer does a central body decide on costs; instead, every developer triggers real spending with every prompt, every model choice, and every agent they start. The frequency at which cost-relevant decisions are made is therefore orders of magnitude higher than with classic software.
The effect is especially pronounced with agentic workflows, that is, AI systems that handle multi-step tasks autonomously. Such processes consume a multiple of a single model call because they work in loops, repeatedly carry context along, and generate intermediate steps. A single careless loop or an unbounded background job can thus cause significant costs in a short time. Whereas a cloud misconfiguration often unfolds its effect over days, an agent running out of control can become costly within minutes.
This shifts the central question. It is no longer whether AI should be used, but how its consumption can be made visible, attributed to individual sources, and limited when necessary.
From cloud tagging to token tagging
The most important idea for decision-makers is this: AI cost control is not an entirely new problem. It is FinOps with a finer granularity and a much higher velocity. Anyone who has already established a tagging strategy in the cloud already holds the decisive mindset. It only needs to be transferred to the new unit of cost.
In cloud FinOps, we attribute every resource via tags such as cost center, team, environment, or project. AI needs exactly the same discipline, only now the tags hang on every model call: user, team, feature, or workflow. Without this...