Modular Cognitive Architecture Emerges in Large Language Models
Neural Mechanisms of Reasoning in LLMs
Modular Cognitive Architecture Emerges<br>in Large Language Models
Pengrui Han·Jacob Andreas·Evelina Fedorenko†·Andrea Gregor de Varda†
Massachusetts Institute of Technology
† Co-senior authors
Preprint · 2026
Code & Data
arXiv
Manuscript
Interactive · click a domain, or watch it cycle
Illustrative demo, not a results figure. The probe prompts<br>are scripted, and the right-hand network is a schematic node-link sketch: its<br>per-domain depth profile is informed by the real Qwen2.5-32B attribution, but it does not<br>render individual neurons. The left panel shows canonical literature brain networks. The<br>actual quantitative results are the figures below.
Abstract
The human brain exhibits a striking degree of functional specialization, with distinct<br>networks supporting language, formal reasoning, reasoning about other minds, and reasoning<br>about the physical world. Is this modular organization a fundamental principle of how<br>intelligent systems must be built, or an evolutionary accident specific to biological brains?<br>Here, we test whether a similar organization emerges in Large Language Models, another class<br>of intelligent systems created through a very different optimization process. Using circuit<br>analyses across N = 46 tasks spanning four cognitive domains (language, formal reasoning,<br>social reasoning, physical reasoning), we find that LLMs develop a modular architecture that<br>mirrors the human brain: tasks drawing on the same network in humans recruit overlapping<br>neurons in LLMs, whereas tasks drawing on different networks recruit distinct neurons. The<br>convergent emergence of modularity in brains and neural networks suggests that it may be a<br>fundamental property of intelligent systems.
46 reasoning tasks across four cognitive domains
Language · Formal · Physical · Social
Tasks sharing a brain network recruit the same neurons
4.3× more overlap within-domain · ARI = 0.78
Ablating a domain's neurons selectively breaks it
10.3× larger accuracy drop within- than cross-domain
The same structure holds in every model
6 frontier LLMs · 24B → 123B
MethodLocalizing the units that support each task
We localize task-supporting units with attribution patching . For each of<br>46 tasks across four cognitive domains we build minimal original/alternative input pairs<br>whose correct continuation flips. A unit's importance is its original-vs-alternative activation<br>difference times the gradient of the original−alternative logit difference, summed over examples.<br>We then quantify modular organization from the pairwise overlap of each task's top-0.1%<br>units, and validate it causally by ablating those units and measuring cross-task transfer.<br>Six instruction-tuned LLMs (24B–123B, four families) are analyzed.
Figure 1. Identifying domain-specific functional organization in large language<br>models. (A) The meta-dataset: 46 tasks spanning four cognitive domains:<br>Language (8 tasks, 8,877 pairs), Formal reasoning (20 tasks, 19,941 pairs), Physical reasoning<br>(9 tasks, 9,200 pairs), and Social reasoning (9 tasks, 11,412 pairs). Each domain is grounded<br>in a well-characterized functional network of the human brain: the language network, the<br>multiple-demand network, the intuitive-physics network, and the theory-of-mind network,<br>respectively. (B) Pipeline. (1) For each task we construct minimal contrastive pairs of<br>original and alternative inputs that elicit opposite correct continuations (here, addition vs.<br>subtraction). (2) We run both inputs through the model and record activations at every MLP<br>neuron. (3) Attribution patching scores each unit by the activation difference times the<br>gradient of the original−alternative logit difference, yielding a causal estimate of its<br>contribution. (4) We measure pairwise overlap of top-attributed units across tasks and validate<br>the structure through causal ablation. (C) Example contrastive problems for each domain;<br>original and alternative inputs are matched in surface form so that attribution reflects the<br>reasoning-relevant contrast rather than format.
ExploreThe 46 tasks · one example each
Each task is defined by minimal original / alternative input pairs whose<br>correct continuation flips. Click any task below to see one representative pair:<br>the original prompt and its correct continuation in green, and the alternative<br>prompt with the flipped continuation in red.
One example per task; full datasets (the count shown per task) are in the<br>code repository.
Result · StructuralA modular organization of reasoning systems
Figure 2. A modular organization of task-selective neurons. Pairwise<br>overlap of the top-0.1% task-selective neurons across the 46 tasks, averaged over six frontier<br>LLMs. Ribbons connect task pairs that share attributed neurons. The four colored blocks are<br>dense within domains and almost disjoint between them.
Tasks supported by the same brain network in humans are solved by...