How LLMs Work: A Friendly Map for Humans

How LLMs Actually Work: A Friendly Map for Humans • oreoro

skip to content How LLMs Actually Work: A Friendly Map for Humans June 6, 2026

LLMs are not magic brains. They are prediction machines built from a few repeatable parts: tokens, vectors, attention, memory-like feed-forward layers, and a loop that keeps choosing the next likely piece of text.

The whole idea in one minute1. Tokens: the model's alphabet is not your alphabet2. Embeddings: IDs become meaning-shaped numbers3. Position: the model needs word order4. Attention: tokens decide what to pay attention to5. Multi-head attention: many views at once6. Feed-forward networks: where a lot of learned structure lives7. Residual stream and normalization: keeping deep models trainable8. Next-token prediction: the answer is built one piece at a time9. Architecture vs weights: why models feel different10. GPT-2 and MoE: two useful milestonesGPT-2: scaling the next-token gameMoE: not every token needs the whole building11. The AI ecosystem: MCP, tools, RAG, agents, and evalsMCP: the USB-C idea for AI toolsRAG: giving the model an open bookAgents: the loop around the modelA friendly checklist for understanding any LLM answerFurther readingInterlinked Content ✍️ Source note: this is an original, beginner-friendly rewrite inspired by Kato's article How LLMs Actually Work, with extra examples, code, tables, and Notion-native structure.

The whole idea in one minute An LLM, or large language model, takes your text, turns it into numbers, runs those numbers through many transformer layers, and predicts what text should come next. That is the simple version. The useful version is this: Your prompt is split into tokens , which are small text pieces. Each token becomes a vector , which is a list of numbers that carries learned meaning. The model adds information about order , because dog bites man and man bites dog do not mean the same thing. Attention lets each token decide which earlier tokens matter. A feed-forward network does deeper processing for each token. Residual connections and normalization keep the many layers stable. The model outputs scores for the next possible token. One token is chosen, added to the text, and the loop repeats.

flowchart LR A["You type a prompt"] --> B["Tokenizertext pieces"] B --> C["Embeddingsmeaning as numbers"] C --> D["Position signalword order"] D --> E["Attentionwhat should matter?"] E --> F["Feed-forward layerdeeper processing"] F --> G["Next-token scores"] G --> H["Pick one token"] H --> I["Add it to the text"] I --> E

A good mental model: an LLM is like an autocomplete system that has read a massive library and learned incredibly subtle patterns about what usually follows what.

Part Plain-English job Why it matters Tokens Break text into pieces The model cannot read raw words or letters directly. Embeddings Turn pieces into meaning-shaped numbers Similar ideas can sit near each other in number-space. Position Tell the model where each piece appears Order changes meaning. Attention Let tokens look at useful previous tokens This is how context flows through the sentence. Feed-forward network Process each token more deeply A lot of learned structure lives here. Next-token prediction Score likely continuations This is the generation loop behind every answer. 1. Tokens: the model's alphabet is not your alphabet Models do not see your sentence the way you do. You see words. The model sees token IDs. A tokenizer might split a sentence like this:

Text: "The sleepy robot writes poetry." Tokens: ["The", " sleepy", " robot", " writes", " poetry", "."] IDs: [791, 47823, 11205, 13004, 24465, 13]

Those ID numbers are what enter the model. The specific numbers differ across model families, but the pattern is the same: text becomes a sequence of integers. Why not just use whole words? Because language is messy. New names, typos, code, slang, and other languages would explode the vocabulary. Tokens sit between letters and words: flexible enough for rare text, efficient enough for common text.

Slightly technical: why the strawberry counting problem happens When you ask a model how many letters are in a word, the model may not be looking at separate letters. It may see a word as one or a few tokens. That means character-level questions can be awkward unless the model deliberately reasons about spelling.

const vocabulary = { "The": 791, " sleepy": 47823, " robot": 11205, " writes": 13004, " poetry": 24465, ".": 13, };

const prompt = ["The", " sleepy", " robot", " writes", " poetry", "."]; const tokenIds = prompt.map((piece) => vocabulary[piece]);

console.log(tokenIds); // [791, 47823, 11205, 13004, 24465, 13]

2. Embeddings: IDs become meaning-shaped numbers A token ID by itself is just a label. ID 11205 does not mean robot unless the model has a learned table that says what vector should represent that token. That table is called the embedding matrix . Think of it as a huge spreadsheet: Every token ID gets one row. Every row...

How LLMs Work: A Friendly Map for Humans

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy