The Anatomy of an LLM | Interactive Visual Guide to How Language Models Work<br>Introduction<br>Introduction
Large language models can feel like black boxes. You type a prompt, something smart comes back, and<br>somewhere in the middle billions of parameters supposedly did "AI".
This guide opens that box.
We will follow one chain from beginning to end. First, text is split into tokens. Those tokens become<br>vectors. The vectors move through layers of attention and feed-forward networks. At the end, the model<br>produces scores for possible next tokens, and a decoding strategy chooses what comes out.
The goal is not to memorize every formula. The goal is to understand what changes at each step, and why<br>that step exists at all.
If you are looking for how LLMs work, how transformers work, or how attention, tokenization, KV cache,<br>and quantization fit together, this page keeps those ideas connected in one visual path.
By the end, you should be able to trace the full path:<br>01 Text<br>02 Tokens<br>03 Vectors<br>04 Transformer blocks<br>05 Logits<br>06 Sampling<br>07 Output
And once you can trace that path, the black box becomes a lot smaller.
What you get<br>Concrete visuals, small numbers first, and interactive controls that make each transformation inspectable.
How to use it<br>Scroll top to bottom as a single narrative, or jump between chapters for a specific concept.
Who made this
Roy van Rijn working at openvalue
Table of contents<br>01 Tokenization<br>02 Vector Embeddings<br>03 Neuron Activation<br>04 Feed-Forward Neural Network<br>05 Logits and Sampling<br>06 Backpropagation<br>07 Optimizers<br>08 Attention: Q, K, and V<br>09 Multi-Head Attention<br>10 RoPE<br>11 Transformer Block<br>12 Training Phases<br>13 Post-Training<br>14 Context and KV Cache<br>15 Quantization
Chapter 01<br>Tokenization<br>Before a model can think about text, the text has to become numbers.
A language model does not read words and sentences the way we do. It reads a sequence of token IDs:<br>integers produced by a tokenizer.
That makes tokenization the real entrance to the model. Everything after this point works with<br>numbers, not raw characters.
A token can be a whole word, part of a word, punctuation, whitespace, or a piece of something strange<br>like code, emoji, or a name. This is why tokenization often looks a bit weird when you first see it.<br>The tokenizer is not trying to split text the way a human would. It is trying to represent text<br>efficiently using a fixed vocabulary.
If every token were a full word, the vocabulary would explode. If every token were a single character<br>or byte, every sentence would become very long. Modern tokenizers live between those extremes.
Slicing up the text<br>Before text can enter a language model, it has to be rewritten as numbers.<br>Tokenization is the step that does this. It splits text into small reusable pieces called tokens .<br>A token can be a whole word, part of a word, punctuation, a number, or even a space plus the start of the next word.<br>Each token has an entry in the tokenizer's vocabulary and is replaced by its corresponding integer ID. From that<br>point on, the model is no longer working with characters directly. It sees an ordered list of token IDs.
Why not just use words?<br>Whole words are too rigid. New names, typos, code, inflections, compound words, and multilingual text would<br>constantly produce words the model has never seen before.
Why not just use letters or bytes?<br>That solves the "unknown word" problem, but makes every input much longer. More pieces means more work for the model<br>and less context fits in the same window. Subword tokens are the reasonable compromise: common text stays compact, while unusual text can still be built from smaller pieces.
Below you can experiment with OpenAI's o200k_base tokenizer. Try switching sentences and watch where the<br>boundaries land.<br>Later in this explainer, when the model predicts the next token, it predicts over this same vocabulary.<br>Technical note: the examples below are generated with tiktoken using the o200k_base encoding.<br>Example sentence Classic cognition quote (long sentence)Technical LLM descriptionUnicode, accents, and punctuation mix<br>Raw sentence<br>If the human brain were so simple that we could understand it, we would be so simple that we couldn't.
102 characters<br>22 tokens<br>5 chars/token on average
Tokenized result<br>If<br>#3335
·the<br>#290
·human<br>#5396
·brain<br>#12891
·were<br>#1504
·so<br>#813
·simple<br>#4705
·that<br>#484
·we<br>#581
·could<br>#2023
·understand<br>#4218
·it<br>#480
#11
·we<br>#581
·would<br>#1481
·be<br>#413
·so<br>#813
·simple<br>#4705
·that<br>#484
·we<br>#581
·couldn't<br>#21149
#13
Show token IDs Show whitespace markers
Important takeaway
Tokenization is not just preprocessing. It determines what the model can see in one context window,<br>how expensive your text is, and which pieces the model is allowed to predict next.
One word is not one token
Different models use different tokenizers. The same sentence can become a different number of tokens<br>depending on the model.
Chapter 02<br>Vector Embeddings<br>Token IDs are just...