Show HN: A 155K-param transformer builds a map of a world it's never shown

ankurchrungoo2 pts0 comments

Inside the Model · The World Inside — watch a machine build a world, then change its mind

A model that only ever read an agent's movement symbols built a map of its world inside — all on its own.<br>Here you read what it sees, decode that map from its mind, then edit it — and watch its behavior follow the false belief.

Waking the model…

ACT I

What the model reads · its entire universe

A stream of moves. No picture. No map. No coordinates. Just these symbols, one after another.

What's inside its mind · decoded live

We point a "mind-reader" (a linear probe) at its hidden activations.

Its mind is a black box… for now.

where it actually is<br>where it believes it is

lamp-sense: —

Look inside its mind →<br>⏸ Pause<br>Speed

↺ Release its mind

This isn't a trick. 🜂

A 155,000-parameter network was trained to do one thing: predict the next move-symbol.<br>It was never shown a grid, a map, or a single coordinate. Yet to do its job it built a model of the world<br>the symbols describe — and kept it in its activations, where a simple linear probe can read it out.

When you clicked a cell, you overwrote its internal sense of place (activation patching). It then acted on<br>the false belief — refusing exits that are only walls where it thinks it is, and "seeing" a lamp that isn't there.<br>The representation is causal , not decorative.

measured on this very model:

✓ position decodable 98.8% (chance 2%)

✓ predicts only legal moves 100%

✓ belief-edit changes behavior 100%

✓ phantom-lamp on command 99.7%

And related patterns show up in larger systems. Othello-GPT, trained only on Othello moves, builds an<br>internal board representation (Li et al.; Nanda); and research finds Llama-class models encode linear<br>maps of real-world place and time (Gurnee & Tegmark). This toy model is small enough to prove<br>end-to-end — and to run, live, in your browser.

Next in this series: render the real-world map of place and time hidden inside an actual LLM.

Refs:<br>Othello-GPT world model ·<br>Language Models Represent Space & Time ·<br>Golden Gate Claude

A note on words: here, "belief" means a decoded internal<br>state representation , not consciousness or human-like understanding. This is a controlled toy grid-world experiment;<br>it shows that this model learned a measurable, causally relevant world-state representation — not that all large models<br>have cleanly readable or editable beliefs.

Source ↗<br>Reproduce ↗<br>Read the write-up →

world model mind inside read belief

Related Articles