A Geometric Calculator Inside a Neural Network

Research

The Neural Geometry Series<br>→ -->

We found a neural mechanism that operates over manifolds: a general-purpose addition module inside Llama 3.1 8B which manipulates circular representations of numbers.

Authors

Sheridan Feucht*,1,2

Ekdeep Singh Lubana†,1

Tal Haklay*,1,3

Thomas Fel†,1

Usha Bhalla1,4

Atticus Geiger†,1

Daniel Wurgaft1,5

Can Rager1

* Equal contribution

Raphaël Sarfati1

† Equal senior contribution

Jack Merullo1

1 Goodfire

Thomas McGrath1

2 Northeastern University

Owen Lewis1

3 Technion IIT

4 Harvard University

5 Stanford University

Published

May 14, 2026

Full Paper

Read on arXiv →

Imagine that it's August, and you have to schedule an appointment in six months. How do you work out that your appointment must be in February? Do you verbally walk through the months in your head one by one: August, September, October, November, December, January, February? Or do you conjure a mental image of months arranged in a circle and see that February is opposite to August?

Whatever your personal strategy is, we suspect that it isn't shared with the language model Llama 3.1 8B. In fact, Llama converts August to the number 8, solves the addition problem 8 + 6 = 14, and then converts back to the month February — all in a single forward pass of the network. Each of these numbers is represented geometrically, using Fourier features that draw out circles in activation space.

This solution may appear odd at first, but it is an efficient and elegant reuse of computational machinery. In fact, Llama seems to use this same internal "addition module" across several different tasks which share an addition-like structure.

This case study provides a window into how understanding the geometry of neural representations unlocks a deeper understanding of neural computation – which together explain how a model behaves and generalizes. Understanding this machinery paves the way for better debugging, control, and design of AI.

A general-purpose addition module in Llama

The same addition mechanism is shared across several different tasks in Llama 3.1 8B.

As part of our investigations into neural geometry, we wanted to see how a language model reasons internally about questions like "What is 7 + 9?" and "What is two days after Friday?". To our surprise, we found that a single internal mechanism computes the answer to both of these questions — and also to questions about similar cyclic concepts, e.g. months.

Specifically, we found an "addition module" in layer 18 of Llama 3.1 8B, which we'll dive into for the rest of this post. We discovered this module by tracking the flow of information across layers and token positions, and then validated using causal methods that it works across different tasks (see the paper for details).

Why would a neural network use addition to reason about months and days? During training, there is a finite number of parameters available to develop new capabilities. This optimization pressure incentivizes the reuse of parameters across distinct, but related tasks.

Numbers as circles

Before we can understand how the addition module works, we first need to understand how language models represent numbers – the inputs and outputs of the module. You might think that language models have something like an internal ruler, with numbers lying on a straight line in activation space. Or maybe they use binary numbers, like computers?

The answer is none of the above.

Instead, language models use a group of circles in activation space to represent a single number. Each circle corresponds to the number modulo a second number, i.e., the remainder after division.[1]This is something like a variant of a residue number system without the requirement of the moduli being coprime. For example, the number 17 would be represented as a 1 on the mod-2 circle, 2 on the mod-5 circle, 7 on the mod-10 circle, and 17 on the mod-100 circle.[2]Why have mod-2, mod-5, and mod-10 circles if you already have a mod-100 circle that can represent all the numbers between 1 and 100? This is something of an open question, but we think it has to do with the fact that large circles are somewhat imprecise, so e.g. the mod-2 parity feature helps distinguish 17 from 16 and 18 (which are very close together on the mod-100 circle). See Weber's law. Several prior works have established that circular features exist across multiple different LLMs:[3]Nanda et al. 2023, Zhong et al. 2023, Zhou et al. 2024, Zhou et al. 2025, Kantamneni and Tegmark 2025, Levy and Geva 2025, Fu et al. 2026

Figure 3 from Kantamneni & Tegmark (2025). GPT-J encodes numbers on circles.

Using a bunch of circles to represent a number probably seems like an alien solution, but it is a common mathematical technique known as a Fourier decomposition (see the paper for more detail).

Each of the inputs and the output of the addition module is...

A Geometric Calculator Inside a Neural Network

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast