Thinking to recall: How reasoning unlocks parametric knowledge in LLMs

krackers1 pts0 comments

Thinking to recall: How reasoning unlocks parametric knowledge in LLMs

Skip to main content

Google

Research

Search

Thinking to recall: How reasoning unlocks parametric knowledge in LLMs

June 24, 2026<br>Zorik Gekhman and Jonathan Herzig, Research Scientists, Google Research

We study the counterintuitive phenomenon where reasoning helps language models recall simple facts, even when no complex step-by-step solutions are required. We show that this phenomenon is driven by two mechanisms: (1) using generated reasoning tokens to perform latent computation, and (2) generating related facts to prime correct answer recall.

Quick links

Paper

Share

Copy link

It is well-established that allowing large language models (LLMs) to generate step-by-step reasoning traces, commonly known as chain-of-thought (CoT), enhances performance on complex tasks. When a model solves difficult math equations, writes software, or answers multi-hop factual questions, breaking the problem down into manageable logical steps is highly effective.<br>However, the utility of this approach remains unclear for simple, single-hop factual questions. For instance, consider a query like: "What year was Mary Engle Pennington inducted into the National Inventors Hall of Fame?" An LLM either has the fact stored in its parametric memory (knowledge encoded directly into its weights) or it doesn't; no complex arithmetic or logical deduction is required. So why would a reasoning trace help?<br>In "Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs”, we investigate this phenomenon. We demonstrate that allowing a model to generate a reasoning trace unlocks correct answers that are otherwise effectively unreachable. To understand why reasoning aids parametric knowledge recall when there are no complex reasoning steps to execute, we conduct a series of hypothesis-driven controlled experiments. Our findings reveal two complementary mechanisms driving this: a computational buffer effect and factual priming.

Probing the knowledge boundary

We first measure the parametric recall capability boundary using the pass@k metric. Instead of only checking one model-generated answer, pass@k checks if the correct fact exists within multiple generated attempts. By evaluating the presence of successful reasoning paths in the model’s output distribution while being less sensitive to their exact ranking, pass@k helps us estimate the potential of reasoning for factual recall, rather than only looking at the current model’s top-1 behavior. To assess the impact of reasoning while controlling for parametric knowledge, we focus on reasoning LLMs (R-LLMs) where reasoning can be enabled or disabled (toggled on or off), and compare pass@k between these two modes. We focus on the Gemini-2.5 (Flash and Pro) and Qwen3-32B models, using two challenging closed-book QA datasets: SimpleQA Verified and EntityQuestions.<br>The results are surprisingly consistent. When reasoning is enabled, the models successfully recall answers that are virtually unrecoverable when reasoning is off. Importantly, this improvement isn't just because the model is decomposing complex questions. This results from our deliberate focus on datasets containing predominantly simple, single-hop questions.

Pass@𝑘 curves across two closed-book QA datasets and three LLMs, comparing the same models with reasoning enabled (ON) vs reasoning disabled (OFF).

These results raise the question: if the effect does not come from step-by-step reasoning, what reasoning patterns enable the model to retrieve the correct answer?

Mechanism 1: The computational buffer

Our first hypothesis focuses on the mechanics of generation. We take the long-standing hypothesis that generating extra tokens acts as extended computation time by providing additional forward passes, and test it in the new setting of parametric knowledge recall in R-LLMs. Specifically, we hypothesize that models implicitly use these reasoning tokens as a computational buffer to perform latent processing, independent of the actual semantic content being generated.<br>To test this, we design an experiment that removes all meaningful content from the reasoning trace . We intercept the model's reasoning process and replace its generated trace with a meaningless string "Let me think", repeated over and over until it matches the length of the original reasoning trace. We then let the model predict the final answer conditioned on this dummy text.<br>Remarkably, conditioning the model on this meaningless trace substantially improves its ability to recall the correct answer compared to the baseline where reasoning is completely turned off. This provides strong evidence that simply giving the model more computational runway helps it refine its internal state and fetch hard-to-reach facts.

Computation buffer effect on Gemini-2.5-Flash. ON Dummy overrides the thinking trace with a short sequence without factual content that is repeated to match the token length of...

reasoning recall model parametric knowledge llms

Related Articles