Lost in the Middle: Why LLMs Forget What They Just Read

SubscribeSign in

Lost in the Middle: Why LLMs Forget What They Just Read Modern LLMs can read enormous contexts. They just don't read the middle.

Cristobal Santana Jun 02, 2026

The Problem

If you have ever built a RAG system, where the model retrieves relevant documents and uses them to answer a question, you have probably felt this without naming it. You retrieve twenty chunks, the relevant one is in there somewhere, the model has it in context, and it still answers as if it never saw it. So you add more context to be safe, and somehow the answer gets worse. This is not a retrieval bug, and you cannot fix it by switching to a model with a bigger context window. It is the model not reading its own middle. This matters more than it sounds. A system that silently ignores part of its input is a system that fails in ways you cannot see in a demo. It works when you test it with three documents and the answer is in the first one. It breaks in production when the answer happens to land in document eleven of twenty, and now you have a support ticket, a user who does not trust the output, and an engineer trying to reproduce a bug that depends on the position of a document nobody thought to track. There is an old finding in cognitive psychology that fits this almost perfectly. Murdock described the serial position effect in 1962: people recall the first and last items in a list far better than the items in between. Human memory is U-shaped. We remember the beginning, we remember the conclusion, and the middle turns into a vague impression. Modern machine learning was supposed to be different. The transformer was sold partly on its ability to attend equally to every position in the input. The attention mechanism, in principle, lets each token reach across the entire context with equal ease. Everything is, mathematically, the same distance away. That is the implicit promise of long-context models: feed it a long document and it will actually use it. In 2023, Liu and colleagues at Stanford and Berkeley showed that this assumption is wrong. Their paper, Lost in the Middle: How Language Models Use Long Contexts, showed that current LLMs behave a lot like the human U-shape. Information at the beginning and end of the context gets used. Information in the middle, even when it is the exact answer to the question being asked, often gets ignored. The model can read it. It just does not pull from it the way it pulls from the edges. This post is about why that happens, what it costs you if you are building real systems on top of LLMs, and why it is still an open problem in 2026. What’s Actually Happening

The experiment Liu et al. designed is clean enough that you can picture it right away. The model receives a question plus several documents, exactly one of which contains the answer. The other documents are real but irrelevant, distractors pulled from the same corpus. The key move is that they change the position of the answer-bearing document within the context, from the first slot to the last, keeping everything else the same. Same question, same documents, same model. Only the position of the answer changes. If a model really used its context evenly, accuracy should be flat across positions. Instead they found a clear U-shape: high accuracy when the relevant document sits at the start or the end, and a clear drop when it sits in the middle. The model is not using its middle. What made this land was that it was not a quirk of one model. They tested open-source and proprietary models alike (GPT-3.5-Turbo, Claude, MPT, Longchat variants) and the U-shape showed up in every one, with different severity. Models with longer context windows did not escape it. If anything, the drop got worse as the context grew. This is what made the finding important for anyone deploying these systems: it was not a bug in one architecture you could swap out, it was a property of how transformer-based models, as a class, process long inputs. Later replications across newer models have confirmed that while the size of the effect can be reduced, the U-shape stays. The middle of the context is still, in 2026, a partial blind spot for most LLMs. As the context grows, the model pays less attention to the middle. The edges stay sharp.

Why the Middle Disappears

Three explanations have been proposed, and the honest answer is that the field has not fully settled which one matters most. They probably stack on top of each other. You do not need to master the math to make good decisions here, but the intuition is worth having, because it tells you why no prompt trick makes the problem go away. The first is about the data these models were trained on. In typical text, important information sits at the beginning (introductions, headlines, opening paragraphs) and at the end (conclusions, key takeaways). The middle tends to be supporting or transitional. A model trained to predict the next token...

Lost in the Middle: Why LLMs Forget What They Just Read

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy