Thinking to recall: How reasoning unlocks parametric knowledge in LLMs
Skip to main content
Research
Search
Thinking to recall: How reasoning unlocks parametric knowledge in LLMs
June 24, 2026<br>Zorik Gekhman and Jonathan Herzig, Research Scientists, Google Research
We study the counterintuitive phenomenon where reasoning helps language models recall simple facts, even when no complex step-by-step solutions are required. We show that this phenomenon is driven by two mechanisms: (1) using generated reasoning tokens to perform latent computation, and (2) generating related facts to prime correct answer recall.
Quick links
Paper
Share
Copy link
It is well-established that allowing large language models (LLMs) to generate step-by-step reasoning traces, commonly known as chain-of-thought (CoT), enhances performance on complex tasks. When a model solves difficult math equations, writes software, or answers multi-hop factual questions, breaking the problem down into manageable logical steps is highly effective.<br>However, the utility of this approach remains unclear for simple, single-hop factual questions. For instance, consider a query like: "What year was Mary Engle Pennington inducted into the National Inventors Hall of Fame?" An LLM either has the fact stored in its parametric memory (knowledge encoded directly into its weights) or it doesn't; no complex arithmetic or logical deduction is required. So why would a reasoning trace help?<br>In "Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs”, we investigate this phenomenon. We demonstrate that allowing a model to generate a reasoning trace unlocks correct answers that are otherwise effectively unreachable. To understand why reasoning aids parametric knowledge recall when there are no complex reasoning steps to execute, we conduct a series of hypothesis-driven controlled experiments. Our findings reveal two complementary mechanisms driving this: a computational buffer effect and factual priming.
Probing the knowledge boundary
We first measure the parametric recall capability boundary using the pass@k metric. Instead of only checking one model-generated answer, pass@k checks if the correct fact exists within multiple generated attempts. By evaluating the presence of successful reasoning paths in the model’s output distribution while being less sensitive to their exact ranking, pass@k helps us estimate the potential of reasoning for factual recall, rather than only looking at the current model’s top-1 behavior. To assess the impact of reasoning while controlling for parametric knowledge, we focus on reasoning LLMs (R-LLMs) where reasoning can be enabled or disabled (toggled on or off), and compare pass@k between these two modes. We focus on the Gemini-2.5 (Flash and Pro) and Qwen3-32B models, using two challenging closed-book QA datasets: SimpleQA Verified and EntityQuestions.<br>The results are surprisingly consistent. When reasoning is enabled, the models successfully recall answers that are virtually unrecoverable when reasoning is off. Importantly, this improvement isn't just because the model is decomposing complex questions. This results from our deliberate focus on datasets containing predominantly simple, single-hop questions.
Pass@𝑘 curves across two closed-book QA datasets and three LLMs, comparing the same models with reasoning enabled (ON) vs reasoning disabled (OFF).
These results raise the question: if the effect does not come from step-by-step reasoning, what reasoning patterns enable the model to retrieve the correct answer?
Mechanism 1: The computational buffer
Our first hypothesis focuses on the mechanics of generation. We take the long-standing hypothesis that generating extra tokens acts as extended computation time by providing additional forward passes, and test it in the new setting of parametric knowledge recall in R-LLMs. Specifically, we hypothesize that models implicitly use these reasoning tokens as a computational buffer to perform latent processing, independent of the actual semantic content being generated.<br>To test this, we design an experiment that removes all meaningful content from the reasoning trace . We intercept the model's reasoning process and replace its generated trace with a meaningless string "Let me think", repeated over and over until it matches the length of the original reasoning trace. We then let the model predict the final answer conditioned on this dummy text.<br>Remarkably, conditioning the model on this meaningless trace substantially improves its ability to recall the correct answer compared to the baseline where reasoning is completely turned off. This provides strong evidence that simply giving the model more computational runway helps it refine its internal state and fetch hard-to-reach facts.
Computation buffer effect on Gemini-2.5-Flash. ON Dummy overrides the thinking trace with a short sequence without factual content that is repeated to match the token length of...