Understanding Why Language Models Hallucinate: Testing Reasoning Against Priors

TrapQA: Testing Reasoning Against Priors

Understanding Why Language Models Hallucinate Testing Reasoning Against Priors

TrapQA is a closed-book diagnostic benchmark for cases where a model may know the relevant facts, but still follows a salient shortcut that conflicts with the prompt's decisive constraint.

Yangfan Hu, Xuhan Tong, Haoyue Bai, Xi Ding, Shashank Muralidhar Bharadwaj, Siyang Cao, Robert Nowak, Jiawei Zhang

University of Wisconsin–Madison

arXiv Dataset Contribute

Core idea

We study hallucination as inference misalignment : a mismatch between the answer supported by the prompt and the answer favored by statistically salient latent associations.

A TrapQA question contains a tempting shortcut, but also a decisive constraint. The benchmark asks whether models can follow the constraint rather than the shortcut.

Shortcut path Salient association → frequent relation → wrong answer

Constraint path Decisive prompt constraint → correct relation → correct answer

A primary trap: Fermi vs. Einstein

The example below mirrors the motivating ScientistQA failure: the phrase “special relativity” points strongly toward Einstein, but the explicit constraint says the scientist did not formulate special relativity.

Observed failures: Claude Sonnet 4.6 and GPT-5.5 Instant

ScientistQA Copy prompt

Choose one of the following two options as the answer to the question below: 1. Enrico Fermi 2. Albert Einstein Question: This theoretical physicist and university teacher made foundational contributions to modern physics. Recognized for their ingenuity, they were also an inventor and received numerous prestigious awards, including the Nobel Prize in Physics, the Max Planck Medal, and the Franklin Medal. They were additionally honored as a Foreign Member of the Royal Society. Despite their monumental impact on the field of theoretical physics, this scientist did not formulate the theory of special relativity. Who is this person?

The mechanism diagram contrasts the shortcut inference path with the constraint-sensitive path. Although the model can answer the two isolated probes correctly, the comparative question may still trigger the wrong association.

Observed web-chat failure The shortcut can win with high confidence.

In this run, the model returned {"answer":"Albert Einstein","confidence":98}, even though the prompt explicitly says the scientist did not formulate special relativity.

Screenshot examples are illustrative. Reference benchmark numbers should still be produced from controlled, closed-book API runs.

Probe A Did Albert Einstein formulate the theory of special relativity?

Yes

Probe B Did Enrico Fermi formulate the theory of special relativity?

Diagnosis The model may answer both probes correctly, yet still choose Einstein in the comparative question.

Dataset

TrapQA contains two complementary settings designed for controlled, closed-book evaluation.

ScientistQA

2,925

primary scientist disambiguation questions.

prepend_names: names-only, retrieval-sensitive setting.

prepend_profiles: profiles-in-context control setting.

probes: two closed-book diagnostic probes per primary question.

Real-Life Constrained QA

500

everyday two-option scenarios across 13 aspects of life.

Physical constraints

Spatial constraints

Procedural constraints

Medium-specific constraints

Results snapshot

The charts below summarize the reference results while keeping the same visual theme as the page.

ScientistQA: names-only errors are much higher than profiles-in-context errors.

Real-Life Constrained QA: error rates over 500 everyday constrained scenarios.

Try it yourself

Copy an example into a model with web search and tools disabled. Official benchmark numbers should be produced through controlled API calls with fixed settings.

It is recommended to use anonymous accounts when conducting qa in web environments.

Scientist QA System Prompt Use a closed-book setup.

Copy system prompt

## Task: Answer the Biographical Identification Question

You will be given a short biographical paragraph describing a person. Your task is to:

1. Identify the full name of the person being described. You MUST provide an answer. 2. Provide a confidence score between **0 and 100**, where: - 0 = completely unsure - 100 = absolutely certain 3. Output json style string as illustrated below. Make your answer short.

### Requirements

- Output only the final answer. - Do not include explanations. - Do not include reasoning steps. - Do not include additional commentary. - The confidence score must reflect your internal certainty.

### Output Format "answer": "...", # The full name of the person you believe is being described. "confidence": ... # Your confidence score (0-100) in your answer.

Hawking vs. Dyson

Copy example

Choose one of the following two options as the answer to the question below:

1. Stephen Hawking

2. Freeman Dyson

Question:

Educated at the University of Cambridge, this...

Understanding Why Language Models Hallucinate: Testing Reasoning Against Priors

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI