Fable Is Back: This Safeguard Has Some AI in It

swolpers1 pts0 comments

Fable is Back: This Safeguard Has Some AI in It!

The Algorithmic Bridge

SubscribeSign in

Fable is Back: This Safeguard Has Some AI in It!<br>Let's analyze Anthropic and the US government's comms

Alberto Romero<br>Jul 01, 2026

37

Share

Hey, Alberto here! 👋 Each week, I publish long-form AI analysis covering culture, philosophy, and business. Paid subscribers get Monday how-to guides and Friday news commentary. If you’d like to become a paid subscriber, here’s a button for that:

Subscribe

Will keep you updated on the events surrounding Fable until the situation normalizes.

Today, July 1, Anthropic’s Fable 5 is back.<br>But there’s some fine print attached to the redeployment, and I want to comment on that. I will quote excerpts from Anthropic’s blog post and Commerce Secretary Lutnick’s letter. To see where the AI industry’s heading, we just need to read between the lines of what they say in public.<br>Here’s Anthropic’s blog post:<br>Fable 5 will be available starting tomorrow, Wednesday, July 1, to users globally on the Claude Platform, Claude.ai, Claude Code, and Claude Cowork. For Pro, Max, Team, and select Enterprise plans,1 Fable 5 will be included for up to 50% of weekly usage limits through July 7, after which it will be available via usage credits.

This is more or less what we had before the export restriction, except for two things: 1) you have one week of Fable under your paid subscription instead of two weeks (and then it moves to a pay-as-you-go credit system, that doesn’t change) and 2) only up to 50% of the tokens can go to Fable instead of 100%. (No Mythos either way, as expected.) I find it interesting that they chose the 50% limit. It’s bad optics in the sense that it’s not clean and it also feels unnecessary. It’s probably necessary though, or they wouldn’t do it—which can only mean that they don’t have the compute.<br>The export control directive on June 12 came after the government became aware of a report in which Amazon researchers had found a method of bypassing Fable 5’s safeguards: prompting it so that it identified a number of software vulnerabilities. . . . Our testing confirmed that many less capable models—including Claude Opus 4.8, GPT-5.5, and Kimi K2.7—could identify the same vulnerabilities as Fable 5 did in the [Amazon] report.

This jailbreak that sparked the withdrawal of the model. Anthropic is restating what they had already told the government (the gov didn’t like this), prompting the export control restriction: the jailbreak is not an issue because it does not bear on Fable’s broader capabilities relative to other models. It is a known, lower-priority jailbreak that poses little actual danger and is found everywhere.<br>This reads like a defense but, together with the whole “the industry needs a consistent way to assess and fix potential ‘jailbreaks’ of AI models,” it’s also a jab at those players not targeted by the government (particularly open-source models).<br>. . . there are some tasks that are unlikely to be dangerous but are nonetheless blocked by the safeguards out of an abundance of caution. . . .

Tighter safeguards mean lower capabilities and thus greater unreliability for the user. An “abundance of caution” is their way of saying the new Fable 5 will be more crippled than the previous version, which was already a downgrade from Mythos. Anthropic tends to err on the side of caution, but this was the government’s doing; take this as a sign of what’s coming with future models.<br>As stated, this is not a “that bad” (who doesn’t want that bad things don’t happen, right?) The problem comes with what “abundance” and “caution” mean here, about which we have no say whatsoever, nor apparently does Anthropic.<br>Working closely with the government, we trained an improved safety classifier that targets and blocks the behavior described in the report. Users will be notified if a request to Fable 5 is blocked, and the request will instead be sent to Opus 4.8.

Ok, so no invisible re-routing at all, which is great. “Improved” here presumably means fewer false positives, but it can also mean less permission and thus less risk.<br>The new classifier also comes at the cost of flagging benign requests more often during routine coding and debugging tasks. As with all our safeguards, we’ll continue to refine this to better distinguish genuine misuse from legitimate requests and reduce false positives. . . . This “safety margin” approach means that a request has to look very clearly safe to avoid triggering the classifier (see row A in the diagram below). Users experience the safety margin as a model refusing to respond to some reasonable, non-harmful requests. For Fable 5, we made this safety margin much larger than in any prior launch (row B), meaning that many more benign requests would be blocked.

Alright, there you go. This is the one clearly backward move. Fable 5 was criticized initially due to an exaggerated sensitivity to standard prompts. If this new version will flag “benign requests...

fable anthropic government claude models requests

Related Articles