Why LLMs invent answers instead of saying they don't know

Hallucination vs Confabulation: Why LLMs Invent Answers Instead of Saying "I Don't Know"

SubscribeSign in

Hallucination vs Confabulation: Why LLMs Invent Answers Instead of Saying "I Don't Know" When a model has no real fact to give you, it usually doesn't stay silent. It fills the gap with the most plausible continuation, and it sounds just as confident as when it's right.

Cristobal Santana Jun 30, 2026

Ask a language model for a fact it doesn’t have, and watch what it does. It usually doesn’t stop. It gives you an answer, cleanly formatted, phrased with the same confidence it uses for things it actually knows. Ask it for a source to back a claim, and if it has no real one, it can build a citation from nothing: plausible authors, a plausible title, the right-looking journal and year, sometimes a DOI that resolves to nothing. The output is shaped exactly like a correct answer. That shape is the whole problem, because nothing on the surface separates the real answers from the invented ones. In 2023 a lawyer in New York learned this in front of a federal judge. Steven Schwartz, representing a client in Mata v. Avianca, asked ChatGPT to find case law for his argument, and it returned six decisions with names, quotes, and internal citations. He filed them. None of the cases existed. When he grew nervous and asked ChatGPT whether they were real, it assured him they were. They still weren’t. The court sanctioned him, and the episode became the standard cautionary tale for what these models do when they run out of facts. The courtroom story is dramatic, but the version that does the most damage in real systems is quieter. A model that gives a confident wrong answer is more dangerous than one that gives no answer, because a blank space gets noticed and a fluent paragraph does not. Picture an internal assistant built on a company’s documentation, or a support bot answering customers. Ask it something the documents cover and it answers well. Ask it something just outside that coverage, and a model that won’t say “I don’t know” will produce a clean, specific, wrong answer instead. The failure here isn’t loud. The system doesn’t crash; it just gives a wrong answer in a voice that sounds exactly like a right one. That answer passes review, reaches the user, and gets believed, because it reads like every correct answer next to it. If you read my earlier post on the Reversal Curse, this is the same kind of silent gap, approached from a different angle. Hallucination, Confabulation, and Why the Word Matters

The field calls all of this “hallucination,” and the word has been stretched until it covers almost every way a model can be wrong. Sebastian Farquhar and colleagues, in a 2024 Nature paper, make the useful point that there’s no reason to expect a single mechanism behind every kind of error, so lumping them under one word hides more than it reveals. The standard surveys (Huang and colleagues, 2025) split the term along two axes. One is faithfulness: whether the output stays true to the source you gave the model. The other is factuality: whether the output is true about the world. A summary that contradicts the document it’s summarizing is a faithfulness failure. A confident claim about a person who doesn’t exist is a factuality failure. They look similar on the surface and come from different places. This post is about one specific kind, and it has a better name: confabulation. The word comes from clinical psychology, where it describes a person who fills a gap in memory with a fabricated account, told sincerely and with full confidence, and with no intent to deceive. That last part matters. The person isn’t lying, because lying requires knowing the truth and choosing to hide it. They simply have a hole where a memory should be, and the mind produces something plausible to fill it. “Hallucination” implies perceiving something that isn’t there, which is the wrong picture for a language model. “Confabulation” implies filling a gap with a confident guess, which is exactly right. Farquhar’s group uses the term in a precise way, for the subset of errors that are arbitrary: answers that change if you run the model again with a different random seed, because there was never a stable fact behind them in the first place. I’ll use it the same way. The intuition is the student who didn’t study but refuses to leave the answer blank. Asked a question they can’t recall, they write something that sounds like the textbook, in the right tone, with the right shape, and sometimes they’re even close. The model does this constantly, and it does it well, because producing text that sounds right is the one thing it was built to do. Why It Happens

A language model is trained to predict the next token, the next chunk of text, given everything before it. That’s the whole objective. It does not have one mode for recalling a stored fact and another for making something up. It has a single mode: produce the most probable continuation...

Why LLMs invent answers instead of saying they don't know

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Italy's Meloni says Trump 'made up' story that she 'begged' him for photo at G7