The Anatomy of an LLM

The Anatomy of an LLM | Interactive Visual Guide to How Language Models Work Introduction Introduction

Large language models can feel like black boxes. You type a prompt, something smart comes back, and somewhere in the middle billions of parameters supposedly did "AI".

This guide opens that box.

We will follow one chain from beginning to end. First, text is split into tokens. Those tokens become vectors. The vectors move through layers of attention and feed-forward networks. At the end, the model produces scores for possible next tokens, and a decoding strategy chooses what comes out.

The goal is not to memorize every formula. The goal is to understand what changes at each step, and why that step exists at all.

If you are looking for how LLMs work, how transformers work, or how attention, tokenization, KV cache, and quantization fit together, this page keeps those ideas connected in one visual path.

By the end, you should be able to trace the full path: 01 Text 02 Tokens 03 Vectors 04 Transformer blocks 05 Logits 06 Sampling 07 Output

And once you can trace that path, the black box becomes a lot smaller.

What you get Concrete visuals, small numbers first, and interactive controls that make each transformation inspectable.

How to use it Scroll top to bottom as a single narrative, or jump between chapters for a specific concept.

Who made this

Roy van Rijn working at openvalue

Table of contents 01 Tokenization 02 Vector Embeddings 03 Neuron Activation 04 Feed-Forward Neural Network 05 Logits and Sampling 06 Backpropagation 07 Optimizers 08 Attention: Q, K, and V 09 Multi-Head Attention 10 RoPE 11 Transformer Block 12 Training Phases 13 Post-Training 14 Context and KV Cache 15 Quantization

Chapter 01 Tokenization Before a model can think about text, the text has to become numbers.

A language model does not read words and sentences the way we do. It reads a sequence of token IDs: integers produced by a tokenizer.

That makes tokenization the real entrance to the model. Everything after this point works with numbers, not raw characters.

A token can be a whole word, part of a word, punctuation, whitespace, or a piece of something strange like code, emoji, or a name. This is why tokenization often looks a bit weird when you first see it. The tokenizer is not trying to split text the way a human would. It is trying to represent text efficiently using a fixed vocabulary.

If every token were a full word, the vocabulary would explode. If every token were a single character or byte, every sentence would become very long. Modern tokenizers live between those extremes.

Slicing up the text Before text can enter a language model, it has to be rewritten as numbers. Tokenization is the step that does this. It splits text into small reusable pieces called tokens . A token can be a whole word, part of a word, punctuation, a number, or even a space plus the start of the next word. Each token has an entry in the tokenizer's vocabulary and is replaced by its corresponding integer ID. From that point on, the model is no longer working with characters directly. It sees an ordered list of token IDs.

Why not just use words? Whole words are too rigid. New names, typos, code, inflections, compound words, and multilingual text would constantly produce words the model has never seen before.

Why not just use letters or bytes? That solves the "unknown word" problem, but makes every input much longer. More pieces means more work for the model and less context fits in the same window. Subword tokens are the reasonable compromise: common text stays compact, while unusual text can still be built from smaller pieces.

Below you can experiment with OpenAI's o200k_base tokenizer. Try switching sentences and watch where the boundaries land. Later in this explainer, when the model predicts the next token, it predicts over this same vocabulary. Technical note: the examples below are generated with tiktoken using the o200k_base encoding. Example sentence Classic cognition quote (long sentence)Technical LLM descriptionUnicode, accents, and punctuation mix Raw sentence If the human brain were so simple that we could understand it, we would be so simple that we couldn't.

102 characters 22 tokens 5 chars/token on average

Tokenized result If #3335

·the #290

·human #5396

·brain #12891

·were #1504

·so #813

·simple #4705

·that #484

·we #581

·could #2023

·understand #4218

·it #480

#11

·we #581

·would #1481

·be #413

·so #813

·simple #4705

·that #484

·we #581

·couldn't #21149

#13

Show token IDs Show whitespace markers

Important takeaway

Tokenization is not just preprocessing. It determines what the model can see in one context window, how expensive your text is, and which pieces the model is allowed to predict next.

One word is not one token

Different models use different tokenizers. The same sentence can become a different number of tokens depending on the model.

Chapter 02 Vector Embeddings Token IDs are just...

The Anatomy of an LLM

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine