It blocked us at 'hello ' Anthropic Fable 5 refusing innocuous prompts

Anthropic Claude Fable 5 refuses innocuous prompts

Jump to main content

REG AD

AI AND ML

It blocked us at 'hello!' Anthropic Fable 5 refusing innocuous prompts

Hyper-vigilant safety classifiers turn Fable into cautionary tale

Thomas Claburn

Thomas Claburn

Senior reporter

Published wed 10 Jun 2026 // 21:20 UTC

UPDATED Anthropic's newly released Claude Fable 5 generative AI model is trying so hard to be safe that it's hurting its own userbase. Customers attempting to use the AI knowledge regurgitator are reporting that the model is refusing to answer harmless questions, an issue that has annoyed security researchers following past model releases. Anthropic warned that it had tuned Fable 5's guardrails conservatively: "they’ll sometimes catch harmless requests, though they trigger, on average, in less than five percent of sessions," the company said, promising to "reduce false positives as quickly as we can."

MORE CONTEXT

Anthropic spins a Fable of a tamer, safer Mythos

GM gets datacenter fever, decides to build grid-scale sodium-ion batteries

Datacenter growth may run into a power wall by 2030

macOS 27 beta boots Asahi Linux off Apple Silicon

The company did not immediately respond to a request to quantify model refusals. So it's unclear whether the actual false positive rate is greater or less than five percent. But with an estimated 18 to 30 million users worldwide, even a small percentage of thwarted users makes a racket.

REG AD

Mike Famulare, principal research scientist at the Institute for Disease Modeling, part of the Global Health Division of the Gates Foundation, reports (#66657) that Claude Fable 5 balks at inputs like "Hello."

REG AD

"In Claude Code, Fable 5's input safety classifier emits a model_refusal_fallback (silent switch to Opus 4.8) on the first turn of essentially every session on my account — including a session whose only user input is the word hello!. No repo content, no tool calls, and no file reads are in context when it fires." He is not the only frustrated customer. Many other bug reports have been filed in Anthropic's Claude Code GitHub repo since Fable 5 debuted. These include: [Bug] Fable 5 model safety filters causing false positives on benign messages #66587; Fable 5 refuses to assist with 'Application Security Architect resume' editing #66655; and [Feature Request] Allow Fable 5 usage for non-research lab management systems #67062, among others. On social outrage site X.com, Derya Unutmaz, an immunologist and professor at the Jackson Laboratory for Genomic Medicine, notes, "The word 'cancer' is flagged as a biosecurity risk by Claude Fable 5!" Similar complaints show up in Reddit threads. Fable 5 is unusual because Anthropic has chosen to conceal safety interventions that try to block rival frontier model development. The classifiers designed to catch cybersecurity, biology and chemistry, and distillation attempts fall back on the latest Claude Opus model and the user gets notified. But the counter-competition surveillance, per the company's system card [PDF], "will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)." "Prompt modification" without notice is functionally a man-in-the-middle attack, though one that Anthropic estimates "will impact ~0.03 percent of traffic, concentrated in fewer than 0.1 percent of organizations." As developer Clay Merritt fumes, "Anthropic’s Fable 5 silently sabotages its answers when it detects AI/ML work. No refusal. No notice. Purposeful degradation invisible to the user."

REG AD

Anthropic expects cyber defenders and critical infrastructure providers to use its Claude Mythos 5 model, which shares the underlying model of Fable 5 but without the same safeguards. Doing so, however, requires participating in the company's Project Glasswing program or the trusted access program that's being rolled out for select biology researchers. Devon (last name withheld by request), founder of Abliteration.ai, a service that assists with model abliteration (guardrail removal), told The Register in a phone interview that while there's some degree of fearmongering and marketing hype coming from the big AI labs, it's also fair to say that there are legitimate concerns about how frontier models get used. "Anthropic's making a big bet on their brand that people will trust their brand so much they'll just deal with [refusals]," he said. "But in the long term, people are not just going to accept these companies that centralize control over their lives and what they can have information about." ® Update: In a statement provided to The Register on Wednesday evening, an Anthropic spokesperson acknowledged that the company had made its safeguards too stringent and said it was also working to reduce false positives for biological research "We’re changing Fable 5’s safeguards for frontier LLM development to make them visible. "Starting this week, flagged requests...

It blocked us at 'hello ' Anthropic Fable 5 refusing innocuous prompts

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs