AI Safety Is Underfunded by Design: Model for Incentive-Aligned AI Safety Policy

AI Safety Is Underfunded by Design - by Ryan Baker

norabble

SubscribeSign in

AI Safety Is Underfunded by Design A Model for Incentive-Aligned AI Safety Policy

Ryan Baker May 19, 2026

Dean Ball recently put his finger on something important about AI liability and incentives: In general, market actors do not have great incentives to protect against catastrophic risks. They are massive negative externalities, often dwarfing the balance sheet of any individual firm. Say Anthropic releases a model that a malicious actor uses to conduct a cyberattack that does $5 trillion dollars in damage. Anthropic is only worth $800 billion, so if they get sued for $5 trillion, they are already well past the point of insolvency. A catastrophic harm may well already be “lights out” for Anthropic, or any other company, so there is little incentive to avoid them, if doing so entails real costs in the present day.

He’s right about the structure of the problem — but “little incentive” understates the precision available here. AI companies do have incentive to avoid catastrophic outcomes, just systematically less than society needs them to. That gap can be quantified, and quantifying it points toward what a corrective policy should actually look like. norabble is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

The concerns he’s talking about — catastrophic risks — share a structural feature that distinguishes them from others: they are lumpy. A single catastrophic event, rather than a diffuse trend. The incentive is quite large, but not as large as it should be. These dynamics are worth exploring, as those ultimately shape if and how we structure a response. Consider a hypothetical AI company, worth $800 billion. Now consider a hypothetical event causing $5 trillion in damages. If this event happened, that AI company would be out of business, so they have an incentive to prevent it. But how much incentive? The most they can lose is the whole company, so $800 billion. Since a lot of that is goodwill, in reality, losses become irrelevant earlier. For the sake of example, we’ll say $400 billion. If you had to pay half your market cap, you’re not worth $400 billion, you’re bankrupt, and worth $0. All claims greater than $400 billion have equal impact, since each produces the same outcome, a total loss. This creates an imbalance between societal goals and the AI company’s goals. That imbalance could lead to underinvestment in safety, or risk taking that is out of alignment with societal goals. We can quantify this imbalance, by modeling a damage cap in expected value calculations. If our $5 trillion event has a 1 in 10,000 chance of occurring, the uncapped expected value of avoidance is $500 million. With a damage cap of $400 billion, it’s only $40 million. Society should want that other $460 million in incentive to be shared by the AI company, but without an arrangement, it’s not. Refinements

I used a simple model above, with linear effectiveness of investment in safety. It isn’t linear. In a linear model, spending $500 million reduces risk to zero, and $40 million reduces it to 1/12th of that, or one 1 in 9,166. But we could imagine, in fact we should expect, that the first $40 million does more than the next $40 million. Maybe the first reduces the risk to 1 in 100,000, and the next to 1 in million. It’s the same proportional improvement — 10x. But in the first case it reduces the risk from 100/million to 10/million for a total reduction of 90/million. The second case reduces from 10/million to 1/million, for a total of 9/million reduction. To illustrate, I constructed a model that used logarithmic decay from the initial 1 in 10,000. In this model, under their default incentives, the AI company would want to spend $13.3 million to reduce their expected risk from $40 million to $8.7 million. But the societal risk is still $122 million at this point. The goal of a corrective policy would be for the AI company to act upon the societal risk, which justifies spending $35.2 million to reduce the societal risk to $8.7 million.

Further refinement considers whether organic spending is more efficient than regulatory-induced spending. For example, if regulatory-induced spending had half the effect per dollar as organic spending, not only would the spending go up, but the residual damage would be higher. In Scenario 1, all spending is equally valuable. In Scenario 2, the company spends efficiently up to its capped motivation, after which each real dollar buys only $0.50 of effective safety. And finally in Scenario 3, all spending is at 50% effectiveness.

You can explore other scenarios in the linked single page app (GitHub Repo). You can experiment with different decay functions, damage sizes and company sizes. Organic Forces

We’d also be doing ourselves a disservice if we didn’t recognize the outstanding work that AI companies have done —...

AI Safety Is Underfunded by Design: Model for Incentive-Aligned AI Safety Policy

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast