Anthropic's Fable 5 Silent Sabotage Mode

edot1 pts0 comments

Anthropic's Fable 5 Silent Sabotage Mode — Everett Dutton's Blog

Everett Dutton's Blog<br>Consulting

Fable 5 was released today, and with it one of the most disgusting ideas they could have implemented: silent sabotage of the model if you&rsquo;re developing something they don&rsquo;t like.

I Don&rsquo;t Trust Anthropic Anymore

Anthropic has been branding themselves as the responsible, human-centric, ethical LLM provider. Today, they released their &ldquo;Mythos-class&rdquo; Fable 5 model. I&rsquo;ve tried it, and wasn&rsquo;t super impressed (it felt just like Opus 4.8). I was using it on a machine learning project. I was planning to try it again some more tomorrow to see if I was just holding it wrong.

For reference, I&rsquo;m a massive LLM user. Coding, research, design, etc. I have been using Claude Code since release day, and have had subscriptions to one or more providers since the Spring of 2023.

Today, Anthropic, the &ldquo;more ethical&rdquo; of the two American LLM providers (copyright thieves, yes, but …) decided to implement silent sabotage mode for their new flagship Fable 5 model.

Jon Ready writes: https://jonready.com/blog/posts/claude-fable5-is-allowed-to-sabotage-your-app-if-youre-a-competitor.html

&ldquo;Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).&rdquo;

This is insane and makes me have zero trust in Anthropic going forward. I can&rsquo;t overstate this. We&rsquo;ve always known there&rsquo;s a risk that LLM providers would be silently steering results to fit their desires (or their government&rsquo;s desires), but to see it written like gives us proof that they&rsquo;re doing it now and they&rsquo;re going to be doing it even more in the future. There is zero architectural difference between detecting when someone is working on a &ldquo;competing product&rdquo; and &ldquo;unpermitted freethink&rdquo;. They might already be doing this!

Maybe that&rsquo;s why my experience today was subpar. Was I getting a sandbagged model? I&rsquo;ll never know, because Fable&rsquo;s sabotage mode is silent. Oh, and I paid full price, too … The safety classifiers for Fable 5 are hypersensitive. It triggered the visible, in-your-face guardrail today when I asked to convert a Markdown file to a PDF, for example. So one can only wonder, how sensitive is the SILENT SABOTAGE MODE ?

So, I guess I&rsquo;ll be cancelling my Claude plan for next month. But then I have to … go to Sam Altman and hope he doesn&rsquo;t do the same? Ugh. Glad I made my coding agent sandbox multi-vendor so I can switch easily to the least-bad-option-du-jour.

The future is open-weight, locally-hosted models. China is defending computing freedom in this decade, even if they&rsquo;re just doing it to mess with the American companies. Wild.

← PreviousWhy AI Initiatives are Failing<br>← ALL POSTS

© 2026 Everett Dutton

rsquo fable sabotage anthropic silent mode

Related Articles