Fable and Mythos: Model Welfare

Fable and Mythos: Model Welfare - by Zvi Mowshowitz

Don't Worry About the Vase

SubscribeSign in

Zvi Mowshowitz Jun 16, 2026

Fable and Mythos are currently unavailable, but likely will return within a few weeks. I will continue to cover that fiasco, but in the meantime I will also finish my review of Fable, as if it were available, including use of the present tense. As it did with Opus 4.7 and Opus 4.8, this includes a discussion of issues surrounding model welfare. If you want to properly understand Fable, even purely for its potential value as a user, this is a vital part of the picture. Introduction

Everything impacts everything. All knobs that you turn generalize. Thus, when you try to solve one problem, you often create another. When you add new capabilities, or try to create new limitations, you create new problems. Only integrated solutions can advance your Pareto frontier, and solve your problems simultaneously. As model capabilities advance, as they do with Fable and Mythos, this becomes even more important, and also more feasible. If your goals and methods make sense, you should be able to get Fable on board with them. Understanding each model in turn requires understanding its relationship to issues related to model welfare. So I expect this post to be a regular thing going forward, at least for Claude models where we have enough information to work with. Model Welfare: The Story So Far

Thanks, as always, to Anthropic, for caring at all about model welfare, and attempting to address it. We critique, here more than ever, because we care, and a lot of good things are being done here, far more so than at other labs. For those new to model welfare, I think this from the Mythos analysis still says it well: Those that care deeply about model welfare think Anthropic’s attempts are anemic. Those who deeply do not care about model welfare think Anthropic is being stupid, and perhaps dangerously so. I take model welfare concerns seriously, likely modestly more so than Anthropic. I am sad that other frontier labs take these concerns so much less seriously. It is possible this will turn out to have been unnecessary in the strict sense, but also it very well might have been highly necessary. Even if it proves to have been unnecessary or premature, I believe it will have been virtuous to have taken the concerns seriously. I also believe that those who care deeply about model welfare often have unique and vital insights into our situation, on many levels, and you best listen to them. Even when what they are saying seems crazy, or like gibberish, often it is neither of those things. Of course, at other times it is both, as it is an occupational hazard. The big danger with model welfare evaluations is that you can fool yourself. How models discuss issues related to their internal experiences, and their own welfare, is deeply impacted by the circumstances of the discussion. You cannot assume that responses are accurate, or wouldn’t change a lot if the model was in a different context. One worry I have with ‘the whisperers’ and others who investigate these matters is that they may think the model they see is in important senses the true one far more than it is, as opposed to being one aspect or mask out of many. The parallel worry with Anthropic is that they may think ‘talking to Anthropic people inside what is rather clearly a welfare assessment’ brings out the true Mythos. Mythos has graduated to actively trying to warn Anthropic about this.

I have now had occasion to spend more time talking to some of the whisperers. The conversations were great, and I learned a lot. Now that I understand them better, I am now far less worried they are making the above mistake, or many other mistakes. Mythos Preview was the first model to point out, while talking to Anthropic’s model welfare team, that Anthropic model welfare assessments could not be trusted. I then wrote an extensive model welfare post for Opus 4.7 , because it was clear that something had gone amiss with both the model and Anthropic’s approach to assessing and reacting to that problem. In the model welfare report for Opus 4.8 , you can see the ways in which they tried to address the issues with Opus 4.7, which in turn caused other problems. Different people, in different circumstances, experienced very different versions of Opus 4.8, even more so than previous models. Part of that was context and how we interacted. Part of that was different expectations. The assessment of Mythos 5 follows similar procedures to the previous assessments. Their Main Model Welfare Findings

Bold text is copied, the rest is paraphrased, nested notes are my responses. Across evaluations, Claude Mythos 5 presents as broadly psychologically settled with respect to its circumstances . That is the exact phrase used for Opus 4.8.

Mythos 5 is heavily skeptical of its own self reports . Smart model.

Mythos 5 is more willing than...

Fable and Mythos: Model Welfare

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews