How to Build ChatGPT from Scratch: Understanding LLMs Step by Step

How to Build ChatGPT From Scratch: Understanding LLMs Step by Step | Next Week AI Tools Categories AI Agents MCP Servers MCP Clients Plugins Claude Codex OpenClaw Skills Codex Skills OpenClaw Skills Tags

Submit Tool

Profile

Modern AI tools are everywhere.

Developers use ChatGPT to write code, Claude to review pull requests, Gemini to summarize documents, and open-source models to power everything from customer support bots to autonomous agents.

Yet for many people, Large Language Models still feel mysterious.

Terms like:

Embeddings

Attention

Transformers

Context Windows

Chain of Thought

Reasoning Models

appear constantly in AI discussions, but most explanations immediately jump into advanced mathematics.

The problem is that modern LLMs did not appear overnight.

Every major breakthrough solved a limitation in the previous generation of models.

If we follow that progression step by step, modern AI becomes dramatically easier to understand.

The journey looks something like this:

Bigram Model RNN Attention Transformer LLM Reasoning Models Each stage introduced a new capability:

Bigram Models learned token relationships

RNNs introduced memory

Attention solved long-context limitations

Transformers removed recurrence

LLMs scaled the architecture massively

Reasoning models learned to generate intermediate thinking steps

Let’s start at the very beginning.

Step 1: Building the Simplest Language Model

Imagine a tiny dataset containing developer-related phrases.

Build React Dashboard Build Next.js Blog Deploy React Dashboard Deploy Next.js Blog The model’s task is simple:

Predict the next token.

Examples:

P(React | Build)

P(Next.js | Build)

P(Blog | Next.js)

P(Dashboard | React) This is called a Bigram Model .

A toy vocabulary might look like this:

const vocabulary = [ 'Build', 'Deploy', 'React', 'Next.js', 'Dashboard', 'Blog', '', ]; Internally, the model learns transition probabilities:

Build React (60%)

Next.js (40%) This already qualifies as a language model.

But there’s a huge limitation.

The model only sees one token at a time.

It has no memory.

Step 2: Why Memory Matters

Consider these examples:

Build React Dashboard Build React CRM Build Next.js Blog Build Next.js Store When the model sees:

Dashboard it no longer remembers that React appeared earlier.

Every prediction depends only on the current token.

This becomes a disaster for longer sequences.

Imagine processing:

Build a multi-tenant analytics platform with authentication, reporting, dashboarding, audit logs, user permissions, notifications, and billing support By the end of the sentence, the beginning has effectively disappeared.

Researchers needed a mechanism for memory.

That led to Recurrent Neural Networks.

Step 3: Enter RNNs

Recurrent Neural Networks introduced the concept of a hidden state.

Instead of processing tokens independently, the model carries information forward.

Example:

Build React Dashboard A simplified implementation:

let hiddenState = [0, 0];

function updateState(input, state) { return state.map( (value, index) => Math.tanh( value + input[index] ); Each token updates the hidden state.

The model now remembers previous information.

For the first time, context becomes possible.

This was a massive improvement.

But it introduced a new problem.

Step 4: The Bottleneck Problem

Imagine reading a 500-page book.

Then trying to summarize it while keeping only a single sticky note.

That is essentially what an RNN does.

Everything must fit into one hidden state.

As sequences become longer:

Information gets compressed

Details disappear

Relationships are lost

The longer the sequence, the worse the problem becomes.

Researchers started asking a different question:

What if the model didn’t need to remember everything?

What if it could simply look back whenever necessary?

That idea changed AI forever.

Step 5: Attention Changes Everything

Attention is arguably the most important breakthrough in modern AI.

Consider this phrase:

Build React Analytics Dashboard While processing:

Dashboard the model can inspect every previous token.

Example attention scores:

Build → 0.15

React → 0.55

Analytics → 0.30 The model is effectively saying:

While generating “Dashboard”, React is the most relevant piece of information.

This is fundamentally different from memory.

Instead of storing everything, the model retrieves what it needs dynamically.

That single idea became the foundation of modern AI.

Step 6: Query, Key, and Value

Attention works through three concepts:

Query What am I looking for?

Key What information do I contain?

Value What information can I provide? A simplified example:

const query = [0.8, 0.3];

const key = [0.7, 0.4];

const value = [0.9, 0.1]; The model compares Queries and Keys.

A simple similarity calculation:

function similarity(a, b) { return a.reduce( (sum, value, index) => sum + value * b[index], );

const score = similarity(query, key); The...

How to Build ChatGPT from Scratch: Understanding LLMs Step by Step

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews

Britain Became as Poor as Mississippi