How LLMs Work: A Friendly Map for Humans

alexander20021 pts0 comments

How LLMs Actually Work: A Friendly Map for Humans • oreoro

skip to content<br>How LLMs Actually Work: A Friendly Map for Humans<br>June 6, 2026

LLMs are not magic brains. They are prediction machines built from a few repeatable parts: tokens, vectors, attention, memory-like feed-forward layers, and a loop that keeps choosing the next likely piece of text.

The whole idea in one minute1. Tokens: the model's alphabet is not your alphabet2. Embeddings: IDs become meaning-shaped numbers3. Position: the model needs word order4. Attention: tokens decide what to pay attention to5. Multi-head attention: many views at once6. Feed-forward networks: where a lot of learned structure lives7. Residual stream and normalization: keeping deep models trainable8. Next-token prediction: the answer is built one piece at a time9. Architecture vs weights: why models feel different10. GPT-2 and MoE: two useful milestonesGPT-2: scaling the next-token gameMoE: not every token needs the whole building11. The AI ecosystem: MCP, tools, RAG, agents, and evalsMCP: the USB-C idea for AI toolsRAG: giving the model an open bookAgents: the loop around the modelA friendly checklist for understanding any LLM answerFurther readingInterlinked Content<br>✍️<br>Source note: this is an original, beginner-friendly rewrite inspired by Kato's article How LLMs Actually Work, with extra examples, code, tables, and Notion-native structure.

The whole idea in one minute<br>An LLM, or large language model, takes your text, turns it into numbers, runs those numbers through many transformer layers, and predicts what text should come next.<br>That is the simple version. The useful version is this:<br>Your prompt is split into tokens , which are small text pieces.<br>Each token becomes a vector , which is a list of numbers that carries learned meaning.<br>The model adds information about order , because dog bites man and man bites dog do not mean the same thing.<br>Attention lets each token decide which earlier tokens matter.<br>A feed-forward network does deeper processing for each token.<br>Residual connections and normalization keep the many layers stable.<br>The model outputs scores for the next possible token.<br>One token is chosen, added to the text, and the loop repeats.

flowchart LR<br>A["You type a prompt"] --> B["Tokenizertext pieces"]<br>B --> C["Embeddingsmeaning as numbers"]<br>C --> D["Position signalword order"]<br>D --> E["Attentionwhat should matter?"]<br>E --> F["Feed-forward layerdeeper processing"]<br>F --> G["Next-token scores"]<br>G --> H["Pick one token"]<br>H --> I["Add it to the text"]<br>I --> E

A good mental model: an LLM is like an autocomplete system that has read a massive library and learned incredibly subtle patterns about what usually follows what.

Part Plain-English job Why it matters Tokens Break text into pieces The model cannot read raw words or letters directly. Embeddings Turn pieces into meaning-shaped numbers Similar ideas can sit near each other in number-space. Position Tell the model where each piece appears Order changes meaning. Attention Let tokens look at useful previous tokens This is how context flows through the sentence. Feed-forward network Process each token more deeply A lot of learned structure lives here. Next-token prediction Score likely continuations This is the generation loop behind every answer.<br>1. Tokens: the model's alphabet is not your alphabet<br>Models do not see your sentence the way you do. You see words. The model sees token IDs.<br>A tokenizer might split a sentence like this:

Text: "The sleepy robot writes poetry."<br>Tokens: ["The", " sleepy", " robot", " writes", " poetry", "."]<br>IDs: [791, 47823, 11205, 13004, 24465, 13]

Those ID numbers are what enter the model. The specific numbers differ across model families, but the pattern is the same: text becomes a sequence of integers.<br>Why not just use whole words? Because language is messy. New names, typos, code, slang, and other languages would explode the vocabulary. Tokens sit between letters and words: flexible enough for rare text, efficient enough for common text.

Slightly technical: why the strawberry counting problem happens<br>When you ask a model how many letters are in a word, the model may not be looking at separate letters. It may see a word as one or a few tokens. That means character-level questions can be awkward unless the model deliberately reasons about spelling.

const vocabulary = {<br>"The": 791,<br>" sleepy": 47823,<br>" robot": 11205,<br>" writes": 13004,<br>" poetry": 24465,<br>".": 13,<br>};

const prompt = ["The", " sleepy", " robot", " writes", " poetry", "."];<br>const tokenIds = prompt.map((piece) => vocabulary[piece]);

console.log(tokenIds);<br>// [791, 47823, 11205, 13004, 24465, 13]

2. Embeddings: IDs become meaning-shaped numbers<br>A token ID by itself is just a label. ID 11205 does not mean robot unless the model has a learned table that says what vector should represent that token.<br>That table is called the embedding matrix . Think of it as a huge spreadsheet:<br>Every token ID gets one row.<br>Every row...

model token tokens text numbers next

Related Articles