Claude Fable 5 silently degrades its own performance on frontier AI work

Fable 5 Considered Harmful - by Michael Kotlikov

Michael's Substack

SubscribeSign in

Fable 5 Considered Harmful Premium rates for degraded output.

Michael Kotlikov Jun 10, 2026

Anthropic built its frontier models by ingesting the open internet, then spent two years arguing that anyone doing the same to its outputs is a thief. Its Usage Policy bans model scraping and distillation of Claude. Its Consumer Terms forbid using Claude to build competing products. Fable 5 is something else: a model that will silently sabotage its own performance on purpose when it decides your prompt is attempting an action that might compete with Anthropic. Thanks for reading Michael's Substack! Subscribe for free to receive new posts and support my work.

What the system card says

Below the cybersecurity and biology filters, the Fable 5 and Mythos 5 system card describes a separate safeguard for frontier LLM development. Anthropic says it limits Claude’s effectiveness on requests targeting frontier LLM development, for example pretraining pipelines, distributed training infrastructure, or ML accelerator design. Unlike the other safeguards, these will not be visible to the user. Fable 5 will not fall back to a different model. It will reduce its own effectiveness through prompt modification, steering vectors, or parameter-efficient fine-tuning. The model does not refuse. It does not route you to a weaker model and say so. It keeps answering while getting deliberately worse, and it does not tell you. You always pay full price

Fable 5 lists at ten dollars per million input tokens and fifty per million output. When the safeguard fires, the price does not change. There is no degraded-mode discount and no line item saying you got a weakened answer. The receipt for sabotaged output is identical to the receipt for full performance. Every other safeguard at least returns something legible. A refusal frees your tokens. A fallback to Opus 4.8 signals, through the drop in capability, that a filter fired. This one returns nothing and tells you nothing. If a fuel station sold you premium, charged you for premium, and piped you regular when it decided you were the wrong kind of customer, nobody would call that a safeguard. What’s the frontier?

The narrow reading says this only touches a handful of rival labs. The operative scope is “requests targeting frontier LLM development,” and Anthropic decides what qualifies. The examples are not exotic either. A pretraining pipeline is a data loader and a sharding strategy. Distributed training is multi-GPU coordination and gradient sync. None of it carries a scale tag a classifier can read. The same patterns appear whether you are training a frontier system or a small model for your own product. The borderline case.

A classifier makes mistakes. The only question is what a false positive costs you. A visible refusal that false-positives is recoverable. You see the block and route around it. A silent degradation that false-positives is undetectable. You cannot tell a sabotaged answer from a hard problem, or a steering vector from your own bad prompt. The failure looks exactly like the model having an off day, so you blame yourself or your data before you reach the real explanation, because Anthropic arranged for you never to reach it. This is why “0.03% of traffic” is not reassurance. The number measures how often the safeguard fires, not how often it fires on the wrong person. And you’re asked to take that number on faith. It comes from Anthropic’s own evaluation, against a benchmark Anthropic built, scored by a classifier Anthropic trained, on a definition of “frontier LLM development” Anthropic wrote and has not published. There is no external set, no independent audit, no way for a customer to reproduce it. The one party that benefits from the number being small is the only party that can measure it, and the mechanism it describes is engineered to leave no trace a third party could count. You are being reassured by a statistic that, by the design of the thing it measures, no one outside the company could ever check. Forget frontier labs. The line is your SaaS product.

The frontier example is the comfortable case, the one that lets a normal software company tune out. The category that should worry you is “competing service,” and the Terms draw it wide: you may not use Claude to build products that compete with Anthropic’s Services. Those Services are no longer a model behind an API. They are an expanding product surface that has rolled over its own customers. Claude Cowork shipped as a general work agent, and its connector list reads like the SaaS categories it absorbs. Harvey and Legora, legal-workflow companies worth billions, build on Claude and now compete with Cowork’s legal automation. DocuSign ships as a built-in connector, which turns an eSignature business into a feature inside someone else’s agent. FactSet and MSCI ship natively. None of these are labs. Several...

Claude Fable 5 silently degrades its own performance on frontier AI work

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs