How Anthropic trained Fable 5 => by analysing its reasoning traces

ankit2191 pts0 comments

Fable’s approach analysis | Ankit Maloo

Summary: Today, llms are trained in a multi step process post SFT. RL -> Generate quality synthetic data → Self-Distillation on that → another round of RL (simplified). Fable-5 had a solution strategy constrained on how to compose code, and it struggled with a simpler solution until it exhausted all the greedy options. This is consistent with what a self-distillation recipe produces.

Introduction

Given all the hype surrounding Fable-5, I decided to take it for a spin, trying to understand the difference in how it was trained and what made it so good at different evals.

I gave it a simple math problem to see how it goes about it. Used claude.ai web, because claude code removed the ability to see thinking. Problem is fairly simple, you have six numbers and five steps and you to get to an output. You can see the full problem and Fable’s solution directly here: https://gist.github.com/ankitmaloo/c491e8a6e4f96b4e5d11b1f2826297dc

Mythos’ powers

We were told Mythos was very good at cybersecurity exploits, and that Anthropic never explicitly trained the model on such tasks. This post and subsequent blog helped me understand why. My sense is model was very good at chaining primitives together, but to what extent and how remained to be seen. Well, the solution in the above gist is more clarifying than I thought.

Problem

you are playing a game called summle. do you know what it is? its like wordle but with numbers. you are given 6 numbers, and with standard math operations, you have to reach a final number.

Rules.

- Make sums using the tiles at the bottom to reach the target number at the top, in 5 steps or fewer.<br>- Allowed operations: +, - , x, / (divide)<br>- Only positive integers allowed.<br>- you can use one number once.<br>- you can use the output of the operation once as well.

Today's numbers: 1,1,6,12,50,100 output number: 397

do not use code.

Trace here

What the trace says about post-training

Fable 5, no-code, asked to solve a Summle puzzle (reach 397 from 1,1,6,12,50,100 in ≤5 ops). It flailed for ~60k tokens of greedy depth-first search, then solved it within seconds of switching to systematic root-split enumeration.

Solution:<br>step 1:12×50=600,

step 2:600−6=594,

step3: 1+1=2,

step 4:594÷2=297,

step 5: 297+100=397

The trace is fascinating in the sense what the models are conditioned to do when they approach a problem.

NB: This note is the post-mortem on why the struggle happened and what it implies about how the model was post-trained. The analysis is inferred from behavior and without insider knowledge of the training recipe.

The cyber-chain vs. numbers paradox

A model that chains steps in a cybersecurity task:

recon → CVE → exploit → privesc → lateral → exfil

through seven links, but can’t chain a simple:

prime-check → partition → recurse → memoize

through four, looks contradictory at a glance, but reveals a lot about how the model is trained.

The kill chain is composition by retrieval1 . The chain has a canonical order that appears thousands of times in the pretraining corpus (writeups, CTF solutions, ATT&CK). Each link has a determined successor. you got a shell, so now you enumerate for privesc. so the branching factor at each node is ~1 and the ordering is conventional. The work to be done is slot-filling: ie recognizing which CVE fits. The search over orderings was already done by humans and baked into the data as a macro.<br>By macro, I mean a learned routine: a compressed sequence of steps the model can invoke as one familiar move, rather than rebuilding the whole plan from scratch. Like a reusable workflow-shaped prior.

Depth N is high because the model is replaying a memorized pipeline , not searching for a sequence or the next step in what to do.

This is what I suspect what makes the model good2 at coding, cybersecurity, and workflow imitation / routine based tasks. It’s a breakthrough because they trained it on chaining primitives in code, and it learnt how to do it to find exploits in an adjacent domain too.

The numbers game on the other hand is composition by search . There is no canonical “for 397, do X.” The correct chain is instance-specific, the branching factor is enormous, most branches are dead, and you cannot tell a link is wrong without backtracking. Solving it requires the machinery of search. A frontier, a visited-set, value estimates over partial states, a rule for abandoning a subtree. And none of those are linguistic objects. They’re search-control objects the model has to fake in-context with no working memory.

Conclusion: the model’s compositional strength is retrieval-of-chains, not search-over-chains. Cybersec needs the first (deep N). Numbers loads the second (shallow N until saturation forces it). Both are “chaining N skills,” but the machinery is quite different. And post-training elicited one far more than the other. That asymmetry is fascinating for me.

Why systematic would have been easier. and why it went...

model trained numbers fable step post

Related Articles