LLM from Scratch: a small LLM running inside MIT's Scratch

GitHub - Broyojo/llm_from_scratch · GitHub

/" data-turbo-transient="true" />

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Clear

Search syntax tips

Provide feedback

--> We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

/;ref_cta:Sign up;ref_loc:header logged out"}" Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

Broyojo

llm_from_scratch

Public

Notifications You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files NameNameLast commit message Last commit date Latest commit

History 11 Commits 11 Commits

artifacts

docs

llama2.c

llvm2scratch

patches

scratch_llama2

.gitignore

README.md

View all files

Repository files navigation

LLM From Scratch

Run the smallest llama2.c model (stories260K) inside Scratch/TurboWarp by compiling C inference code to Scratch blocks with llvm2scratch.

If everything is working, the sprite will start generating the familiar opening: Once upon a time, ... (streamed into the speech bubble token-by-token).

Live Demo

Scratch project: https://scratch.mit.edu/projects/1277883263

Credits (Upstream)

This repo vendors two upstream projects in-tree for reproducibility:

llama2.c by Andrej Karpathy (MIT). Source: llama2.c/ and llama2.c/LICENSE.

llvm2scratch by Classfied3D (MIT). Source: llvm2scratch/ and llvm2scratch/LICENSE.

The model/tokenizer artifacts in artifacts/ come from the llama2.c ecosystem.

How It Works

High-level pipeline:

scratch_llama2/build_stories260k_sprite3.py reads:

artifacts/stories260K.bin (the smallest llama2.c checkpoint)

artifacts/tok512.bin (tokenizer vocabulary)

It quantizes the weight matrices to Q8_0 (group size 4) and packs 4 signed int8 values into one u32.

It lays out everything into a single Scratch list !stack:

packed weights + per-group scales

RMSNorm weights

RoPE cos/sin tables (for a reduced SEQ_LEN)

runtime buffers (x/xb/hb/q/att + KV cache)

It writes scratch_llama2/generated_layout.h with 1-indexed addresses into !stack.

It compiles scratch_llama2/llama2_scratch.c to LLVM IR (scratch_llama2/llama2_scratch.ll) using:

clang --target=i386-none-elf (keeps pointers as 32-bit ints)

It runs llvm2scratch to turn LLVM IR into Scratch blocks, then exports .sprite3 and .sb3 outputs.

Runtime UI:

!!output (list) stores generated token IDs.

!!vocab (list) stores token pieces (strings).

!!text (variable) accumulates decoded text; the sprite says it continuously.

!!resets (variable) increments when the compiler triggers a broadcast-based “stack reset” (progress indicator + avoids JS call stack blowups).

!!status (variable) shows a high-level state machine (Edit params... -> Running... -> Done.).

ui_* variables let you adjust sampling/generation settings from TurboWarp/Scratch UI.

Build

Requires:

clang

uv (and Python >= 3.12; llvm2scratch requires it)

Command:

= 3.12; pin via `--python` to avoid uv picking an older system Python. uv run --python 3.12 --no-project --with-editable ./llvm2scratch python scratch_llama2/build_stories260k_sprite3.py"># If you don't have a usable Python yet: # uv python install 3.12 # Optional: tune stack reset frequency for TurboWarp stability/perf. # Lower = more stable (less likely to hit "Maximum call stack size exceeded"), but slower. # Higher = faster, but can crash in TurboWarp. # MAX_BRANCH_RECURSION=200 is the default. MAX_BRANCH_RECURSION=200 \\ # Optional: number of tokens to generate (upper bound). Defaults to 20. # (Must be GEN_STEPS=20 \\ # llvm2scratch requires Python >= 3.12; pin via `--python` to avoid uv picking an older system Python. uv run --python 3.12 --no-project --with-editable ./llvm2scratch python scratch_llama2/build_stories260k_sprite3.py

Outputs:

scratch_llama2/stories260k_inference.sprite3: sprite, blocks hidden (fast editor/import)

scratch_llama2/stories260k_inference_visible.sprite3: sprite, blocks visible (debug)

scratch_llama2/stories260k_inference_visible.sb3: standalone project wrapper around the visible sprite

scratch_llama2/stories260k_inference_visible_scratch.sprite3: Scratch-compatible sprite (no TurboWarp-only blocks)

scratch_llama2/stories260k_inference_visible_scratch.sb3: Scratch-compatible standalone project

Run (TurboWarp)

Sprite workflow:

Import...

LLM from Scratch: a small LLM running inside MIT's Scratch

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Britain Became as Poor as Mississippi