The Reversal Curse: Why LLMs Know Tom Cruise’s Mother But Not Her Son
SubscribeSign in
The Reversal Curse: Why LLMs Know Tom Cruise’s Mother But Not Her Son<br>The same fact, looked at from the other end, becomes invisible to the model.
Cristobal Santana<br>Jun 16, 2026
Share
When a person learns that Tom Cruise’s mother is Mary Lee Pfeiffer, something automatic happens in the background. We don’t just store the fact in one direction, from the son to the mother. We get the reverse for free: we now also know that Mary Lee Pfeiffer’s son is Tom Cruise. It’s the same fact, looked at from the other end, and we don’t learn the two ends separately. They’re one fact with two entrances.<br>In classical statistics, this symmetry is built into the math. A correlation between two variables is the same whether you call one of them X or Y. A joint distribution of two things, written P(A, B), doesn’t have a direction. From that single object you can compute the chance of A given B, or the chance of B given A, and any system that knows one should be able to recover the other. The relationship is symmetric by construction.<br>Large language models are different, and that’s the surprising part. A language model is the system behind tools like ChatGPT: it’s trained on huge amounts of text to predict what comes next. The architecture these models use, the transformer, has no built-in sense of “forward” or “backward,” so on paper it could reason in both directions equally well. And yet, when you train one the standard way, by having it predict the next token (a token is a chunk of text, roughly a word or part of a word) given the tokens before it, something strange happens. The model learns a relationship in one direction and almost completely fails to retrieve the same relationship from the other. It will tell you who Tom Cruise’s mother is. It will not reliably tell you who Mary Lee Pfeiffer’s son is. The fact is the same. The model just can’t reach it from the other side.
Forward, the fact flows: A is B. Backward, the same fact hits a wall: B is A?<br>In 2023, Berglund and colleagues named and carefully demonstrated this, and called it the Reversal Curse. They ran two experiments. In the first, they built a set of made-up facts in the form “name is description,” like “Daphne Barrington is the director of the film A Journey Through Time,” trained models on them, and tested both directions. Given the name, the model could recover the description. Given the description, it collapsed to near-random guessing. In the second experiment, they used real celebrity-parent pairs and asked GPT-4 questions both ways. Asked who a celebrity’s parent was, it answered correctly about 79% of the time. Asked the reverse, who a given parent’s child was, it dropped to 33%. The model knew both names. It just couldn’t travel from one to the other.<br>Two things made this finding hard to ignore. The first is that it showed up in every model they tested: GPT-3, GPT-4, Llama, and a range of smaller open models, so it wasn’t a quirk of one system. The second is that a separate team at Anthropic, working at the same time and without knowledge of the first group, hit the same wall from a different angle. Grosse and colleagues (2023) were studying which training examples most influence what a model predicts, using influence functions, a technique from classical statistics that estimates how a model’s behavior would change if you removed a specific training example. They found, almost in passing, that when a model answered a question phrased in one direction, the training examples that mattered were the ones phrased that same way. Examples phrased in reverse had almost no effect. Two papers, written independently, describing the same thing: to the model, the forward and reverse versions of a fact are nearly separate facts.<br>That rules out the easy explanations. This isn’t a problem with one architecture, one dataset, or one company’s training pipeline. It’s a property of how these models, trained to predict the next token, store and retrieve what they know. Three years later, it’s still one of the cleanest demonstrations of a gap that’s easy to forget: predicting the next token well is not the same thing as understanding a fact. This post is about that finding, why it happens, what people have tried to do about it, and why the most promising recent direction suggests the problem belongs to one specific way of training models rather than to language models in general.<br>Why the Reversal Happens
The main explanation is mechanical, and once you see it, it’s almost obvious.<br>These models are trained to predict the next token given everything before it. When the training text says “Tom Cruise’s mother is Mary Lee Pfeiffer,” the training process teaches the model to predict “Mary Lee Pfeiffer” after seeing “Tom Cruise’s mother is.” That update is directional. It strengthens the path from the prefix to the answer. It does nothing, by itself, for the opposite path, from “Mary...