Modular Cognitive Architecture Emerges in Large Language Models

Neural Mechanisms of Reasoning in LLMs

Pengrui Han·Jacob Andreas·Evelina Fedorenko†·Andrea Gregor de Varda†

Massachusetts Institute of Technology

† Co-senior authors

Preprint · 2026

Code & Data

arXiv

Manuscript

Interactive · click a domain, or watch it cycle

Illustrative demo, not a results figure. The probe prompts are scripted, and the right-hand network is a schematic node-link sketch: its per-domain depth profile is informed by the real Qwen2.5-32B attribution, but it does not render individual neurons. The left panel shows canonical literature brain networks. The actual quantitative results are the figures below.

Abstract

The human brain exhibits a striking degree of functional specialization, with distinct networks supporting language, formal reasoning, reasoning about other minds, and reasoning about the physical world. Is this modular organization a fundamental principle of how intelligent systems must be built, or an evolutionary accident specific to biological brains? Here, we test whether a similar organization emerges in Large Language Models, another class of intelligent systems created through a very different optimization process. Using circuit analyses across N = 46 tasks spanning four cognitive domains (language, formal reasoning, social reasoning, physical reasoning), we find that LLMs develop a modular architecture that mirrors the human brain: tasks drawing on the same network in humans recruit overlapping neurons in LLMs, whereas tasks drawing on different networks recruit distinct neurons. The convergent emergence of modularity in brains and neural networks suggests that it may be a fundamental property of intelligent systems.

46 reasoning tasks across four cognitive domains

Language · Formal · Physical · Social

Tasks sharing a brain network recruit the same neurons

4.3× more overlap within-domain · ARI = 0.78

Ablating a domain's neurons selectively breaks it

10.3× larger accuracy drop within- than cross-domain

The same structure holds in every model

6 frontier LLMs · 24B → 123B

MethodLocalizing the units that support each task

We localize task-supporting units with attribution patching . For each of 46 tasks across four cognitive domains we build minimal original/alternative input pairs whose correct continuation flips. A unit's importance is its original-vs-alternative activation difference times the gradient of the original−alternative logit difference, summed over examples. We then quantify modular organization from the pairwise overlap of each task's top-0.1% units, and validate it causally by ablating those units and measuring cross-task transfer. Six instruction-tuned LLMs (24B–123B, four families) are analyzed.

Figure 1. Identifying domain-specific functional organization in large language models. (A) The meta-dataset: 46 tasks spanning four cognitive domains: Language (8 tasks, 8,877 pairs), Formal reasoning (20 tasks, 19,941 pairs), Physical reasoning (9 tasks, 9,200 pairs), and Social reasoning (9 tasks, 11,412 pairs). Each domain is grounded in a well-characterized functional network of the human brain: the language network, the multiple-demand network, the intuitive-physics network, and the theory-of-mind network, respectively. (B) Pipeline. (1) For each task we construct minimal contrastive pairs of original and alternative inputs that elicit opposite correct continuations (here, addition vs. subtraction). (2) We run both inputs through the model and record activations at every MLP neuron. (3) Attribution patching scores each unit by the activation difference times the gradient of the original−alternative logit difference, yielding a causal estimate of its contribution. (4) We measure pairwise overlap of top-attributed units across tasks and validate the structure through causal ablation. (C) Example contrastive problems for each domain; original and alternative inputs are matched in surface form so that attribution reflects the reasoning-relevant contrast rather than format.

ExploreThe 46 tasks · one example each

Each task is defined by minimal original / alternative input pairs whose correct continuation flips. Click any task below to see one representative pair: the original prompt and its correct continuation in green, and the alternative prompt with the flipped continuation in red.

One example per task; full datasets (the count shown per task) are in the code repository.

Result · StructuralA modular organization of reasoning systems

Figure 2. A modular organization of task-selective neurons. Pairwise overlap of the top-0.1% task-selective neurons across the 46 tasks, averaged over six frontier LLMs. Ribbons connect task pairs that share attributed neurons. The four colored blocks are dense within domains and almost disjoint between them.

Tasks supported by the same brain network in humans are solved by...

Modular Cognitive Architecture Emerges in Large Language Models

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level