Anthropic Walks Back Policy That Could Sabotage AI Researchers Using Claude

lumpa1 pts0 comments

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

Simon Willison’s Weblog

Subscribe

Sponsored by: AWS — If you're building with AI, AWS Summit NYC on June 17 is the room you want to be in. 200+ sessions. Totally free. Register here

11th June 2026 - Link Blog

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude . Big scoop for Maxwell Zeff at Wired:

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

There's been a huge outcry about Anthropic's policy, tucked away in their system card, that Claude Fable/Mythos would identify "requests targeting frontier LLM development" and "limit effectiveness" without notifying the user.

It's good news that they're dropping the invisible aspect of this. It would be a whole lot better of they dropped this category of refusals entirely.

Update : More details from @ClaudeDevs on Twitter:

We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible.

Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged requests will return a reason for their refusal (coming to server-side fallback in the next few days).

We wanted to deploy Fable 5 to our users quickly and safely. Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right.

Posted 11th June 2026 at 3:45 am

Recent articles

Initial impressions of Claude Fable 5 - 9th June 2026

Running Python code in a sandbox with MicroPython and WASM - 6th June 2026

Claude Opus 4.8: "a modest but tangible improvement" - 28th May 2026

This is a link post by Simon Willison, posted on 11th June 2026.

ai<br>2,067

generative-ai<br>1,825

llms<br>1,793

anthropic<br>295

claude<br>281

ai-ethics<br>314

claude-mythos<br>11

Monthly briefing

Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.

Pay me to send you less!

Sponsor & subscribe

Disclosures

Colophon

&copy;

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

claude safeguards anthropic june fable back

Related Articles