Anthropic's Fable 5 Silent Sabotage Mode

Anthropic's Fable 5 Silent Sabotage Mode — Everett Dutton's Blog

Everett Dutton's Blog<br>Consulting

Fable 5 was released today, and with it one of the most disgusting ideas they could have implemented: silent sabotage of the model if you’re developing something they don’t like.

I Don’t Trust Anthropic Anymore

Anthropic has been branding themselves as the responsible, human-centric, ethical LLM provider. Today, they released their “Mythos-class” Fable 5 model. I’ve tried it, and wasn’t super impressed (it felt just like Opus 4.8). I was using it on a machine learning project. I was planning to try it again some more tomorrow to see if I was just holding it wrong.

For reference, I’m a massive LLM user. Coding, research, design, etc. I have been using Claude Code since release day, and have had subscriptions to one or more providers since the Spring of 2023.

Today, Anthropic, the “more ethical” of the two American LLM providers (copyright thieves, yes, but …) decided to implement silent sabotage mode for their new flagship Fable 5 model.

Jon Ready writes: https://jonready.com/blog/posts/claude-fable5-is-allowed-to-sabotage-your-app-if-youre-a-competitor.html

“Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).”

This is insane and makes me have zero trust in Anthropic going forward. I can’t overstate this. We’ve always known there’s a risk that LLM providers would be silently steering results to fit their desires (or their government’s desires), but to see it written like gives us proof that they’re doing it now and they’re going to be doing it even more in the future. There is zero architectural difference between detecting when someone is working on a “competing product” and “unpermitted freethink”. They might already be doing this!

Maybe that’s why my experience today was subpar. Was I getting a sandbagged model? I’ll never know, because Fable’s sabotage mode is silent. Oh, and I paid full price, too … The safety classifiers for Fable 5 are hypersensitive. It triggered the visible, in-your-face guardrail today when I asked to convert a Markdown file to a PDF, for example. So one can only wonder, how sensitive is the SILENT SABOTAGE MODE ?

So, I guess I’ll be cancelling my Claude plan for next month. But then I have to … go to Sam Altman and hope he doesn’t do the same? Ugh. Glad I made my coding agent sandbox multi-vendor so I can switch easily to the least-bad-option-du-jour.

The future is open-weight, locally-hosted models. China is defending computing freedom in this decade, even if they’re just doing it to mess with the American companies. Wild.

← PreviousWhy AI Initiatives are Failing<br>← ALL POSTS

Anthropic's Fable 5 Silent Sabotage Mode

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs