Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude
Simon Willison’s Weblog
Subscribe
Sponsored by: AWS — If you're building with AI, AWS Summit NYC on June 17 is the room you want to be in. 200+ sessions. Totally free. Register here
11th June 2026 - Link Blog
Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude . Big scoop for Maxwell Zeff at Wired:
“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”
There's been a huge outcry about Anthropic's policy, tucked away in their system card, that Claude Fable/Mythos would identify "requests targeting frontier LLM development" and "limit effectiveness" without notifying the user.
It's good news that they're dropping the invisible aspect of this. It would be a whole lot better of they dropped this category of refusals entirely.
Update : More details from @ClaudeDevs on Twitter:
We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible.
Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged requests will return a reason for their refusal (coming to server-side fallback in the next few days).
We wanted to deploy Fable 5 to our users quickly and safely. Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right.
Posted 11th June 2026 at 3:45 am
Recent articles
Initial impressions of Claude Fable 5 - 9th June 2026
Running Python code in a sandbox with MicroPython and WASM - 6th June 2026
Claude Opus 4.8: "a modest but tangible improvement" - 28th May 2026
This is a link post by Simon Willison, posted on 11th June 2026.
ai<br>2,067
generative-ai<br>1,825
llms<br>1,793
anthropic<br>295
claude<br>281
ai-ethics<br>314
claude-mythos<br>11
Monthly briefing
Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.
Pay me to send you less!
Sponsor & subscribe
Disclosures
Colophon
©
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026