Show HN: A 155K-param transformer builds a map of a world it's never shown

Inside the Model · The World Inside — watch a machine build a world, then change its mind

A model that only ever read an agent's movement symbols built a map of its world inside — all on its own. Here you read what it sees, decode that map from its mind, then edit it — and watch its behavior follow the false belief.

Waking the model…

ACT I

What the model reads · its entire universe

A stream of moves. No picture. No map. No coordinates. Just these symbols, one after another.

What's inside its mind · decoded live

We point a "mind-reader" (a linear probe) at its hidden activations.

Its mind is a black box… for now.

where it actually is where it believes it is

lamp-sense: —

Look inside its mind → ⏸ Pause Speed

↺ Release its mind

This isn't a trick. 🜂

A 155,000-parameter network was trained to do one thing: predict the next move-symbol. It was never shown a grid, a map, or a single coordinate. Yet to do its job it built a model of the world the symbols describe — and kept it in its activations, where a simple linear probe can read it out.

When you clicked a cell, you overwrote its internal sense of place (activation patching). It then acted on the false belief — refusing exits that are only walls where it thinks it is, and "seeing" a lamp that isn't there. The representation is causal , not decorative.

measured on this very model:

✓ position decodable 98.8% (chance 2%)

✓ predicts only legal moves 100%

✓ belief-edit changes behavior 100%

✓ phantom-lamp on command 99.7%

And related patterns show up in larger systems. Othello-GPT, trained only on Othello moves, builds an internal board representation (Li et al.; Nanda); and research finds Llama-class models encode linear maps of real-world place and time (Gurnee & Tegmark). This toy model is small enough to prove end-to-end — and to run, live, in your browser.

Next in this series: render the real-world map of place and time hidden inside an actual LLM.

Refs: Othello-GPT world model · Language Models Represent Space & Time · Golden Gate Claude

A note on words: here, "belief" means a decoded internal state representation , not consciousness or human-like understanding. This is a controlled toy grid-world experiment; it shows that this model learned a measurable, causally relevant world-state representation — not that all large models have cleanly readable or editable beliefs.

Source ↗ Reproduce ↗ Read the write-up →

Show HN: A 155K-param transformer builds a map of a world it's never shown

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI