"Trust Us" Is Not a Control Surface: Anthropic and the Case for Open Weights

"Trust Us" Is Not a Control Surface

I run a small mortgage company in Georgia. I am not an AI researcher. I do not work at a lab. I am a customer. I pay these companies real money every month, and their tools are part of how I work: the software, the market analysis, the grunt work that used to take a staff. People like me are the ones this technology is supposedly being protected for. So understand what this is. It is not a competitor whining. It is a paying customer telling you what your vendor just admitted in writing, and what they plan to do next with a trillion dollars behind them.

It took them 24 hours to show the whole plan.

What they shipped

On June 9, Anthropic released Claude Fable 5, the most capable AI model ever offered to the public. That is their own framing, and nobody disputes it. The dispute is over what they attached to it. Three things, in their own words.

First, disclosed routing . From the launch announcement: “When Fable’s classifiers detect a request related to cybersecurity, biology and chemistry, or distillation, the response is automatically handled by Claude Opus 4.8 instead. Users will be informed whenever this occurs.” They add that “more than 95% of Fable sessions involve no fallback at all.” Fine. You can argue the thresholds, but at least the product tells you when you are not getting the product.

Second, read this one twice, because it comes from their own system card. For requests that look like frontier AI development, building training pipelines, training infrastructure, chip design:

“Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning.”

Strip the jargon and here is what that paragraph says: we will degrade the product you paid for, we will decide when, and we will not tell you. Anthropic estimates this touches about three hundredths of one percent of traffic. The number is not the point. The precedent is. A company wrote down, in its official documentation, that it sabotages its own product in secret, and shipped it expecting applause.

Third, mandatory surveillance . Every prompt and every output sent to a Mythos-class model gets retained for 30 days. Every platform. No exceptions. Including enterprise customers who had signed zero-data-retention agreements. Those contracts just stopped applying to the new model class. Nobody renegotiated. The justification, in their words: “The data will help us defend against complex and novel attacks... as well as help us identify and reduce false positives.” Your confidential data, conscripted into their security program, whether your NDA allows it or not. Microsoft barred its own employees from the model within one day. That is how fast serious people priced in what this means.

And do not picture that retention as a few sentences you typed into a chat box. Modern AI rarely runs on the question alone. Thousands of people now run agent and memory stacks wired straight into these models: OpenClaw, the assistant people self-host on their own machines; Nous Research’s Hermes agent; gbrain, the memory layer Garry Tan of Y Combinator built for his own agents and open-sourced; and a whole ecosystem of tools like them built on vector databases. The point of this tooling is to hold years of notes, files, deals, and conversations, and to attach the relevant history to every request so the model has context. Anthropic sells the same machinery itself: memory that carries your context across sessions is a flagship feature now. So the user types one line, and the stack attaches the archive. Once that context is in the prompt, it is retained for 30 days like everything else. The rule does not distinguish between the question you asked and the years of private history stapled to it. The careful ones saw this coming. That is why they signed zero-data-retention agreements. Until 24 hours ago, they had a contract that said none of it would be kept.

Which means there is a second, quieter bomb in this. Anyone who fed sensitive information into Fable yesterday or today, before they understood that Anthropic had switched on mandatory 30-day retention overnight with no way to opt out, may have already breached a confidentiality agreement they spent years honoring. A lawyer with a client. A doctor with a chart. A contractor under an NDA that requires zero retention. They did nothing wrong. They used a tool they had every reason to believe still followed the terms they signed, and the vendor changed those terms underneath them and let them keep typing. You cannot consent to a policy you were not told had changed.

And sitting on top of all of it: the same underlying model ships two ways. As Mythos 5,...

"Trust Us" Is Not a Control Surface: Anthropic and the Case for Open Weights

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs