LLM from Scratch: a small LLM running inside MIT's Scratch

alexkranias3 pts0 comments

GitHub - Broyojo/llm_from_scratch · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

Broyojo

llm_from_scratch

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>11 Commits<br>11 Commits

artifacts

artifacts

docs

docs

llama2.c

llama2.c

llvm2scratch

llvm2scratch

patches

patches

scratch_llama2

scratch_llama2

.gitignore

.gitignore

README.md

README.md

View all files

Repository files navigation

LLM From Scratch

Run the smallest llama2.c model (stories260K) inside Scratch/TurboWarp by compiling C inference code to Scratch blocks with llvm2scratch.

If everything is working, the sprite will start generating the familiar opening:<br>Once upon a time, ... (streamed into the speech bubble token-by-token).

Live Demo

Scratch project: https://scratch.mit.edu/projects/1277883263

Credits (Upstream)

This repo vendors two upstream projects in-tree for reproducibility:

llama2.c by Andrej Karpathy (MIT). Source: llama2.c/ and llama2.c/LICENSE.

llvm2scratch by Classfied3D (MIT). Source: llvm2scratch/ and llvm2scratch/LICENSE.

The model/tokenizer artifacts in artifacts/ come from the llama2.c ecosystem.

How It Works

High-level pipeline:

scratch_llama2/build_stories260k_sprite3.py reads:

artifacts/stories260K.bin (the smallest llama2.c checkpoint)

artifacts/tok512.bin (tokenizer vocabulary)

It quantizes the weight matrices to Q8_0 (group size 4) and packs 4 signed int8 values into one u32.

It lays out everything into a single Scratch list !stack:

packed weights + per-group scales

RMSNorm weights

RoPE cos/sin tables (for a reduced SEQ_LEN)

runtime buffers (x/xb/hb/q/att + KV cache)

It writes scratch_llama2/generated_layout.h with 1-indexed addresses into !stack.

It compiles scratch_llama2/llama2_scratch.c to LLVM IR (scratch_llama2/llama2_scratch.ll) using:

clang --target=i386-none-elf (keeps pointers as 32-bit ints)

It runs llvm2scratch to turn LLVM IR into Scratch blocks, then exports .sprite3 and .sb3 outputs.

Runtime UI:

!!output (list) stores generated token IDs.

!!vocab (list) stores token pieces (strings).

!!text (variable) accumulates decoded text; the sprite says it continuously.

!!resets (variable) increments when the compiler triggers a broadcast-based “stack reset” (progress indicator + avoids JS call stack blowups).

!!status (variable) shows a high-level state machine (Edit params... -> Running... -> Done.).

ui_* variables let you adjust sampling/generation settings from TurboWarp/Scratch UI.

Build

Requires:

clang

uv (and Python >= 3.12; llvm2scratch requires it)

Command:

= 3.12; pin via `--python` to avoid uv picking an older system Python.<br>uv run --python 3.12 --no-project --with-editable ./llvm2scratch python scratch_llama2/build_stories260k_sprite3.py"># If you don't have a usable Python yet:<br># uv python install 3.12<br># Optional: tune stack reset frequency for TurboWarp stability/perf.<br># Lower = more stable (less likely to hit "Maximum call stack size exceeded"), but slower.<br># Higher = faster, but can crash in TurboWarp.<br># MAX_BRANCH_RECURSION=200 is the default.<br>MAX_BRANCH_RECURSION=200 \\<br># Optional: number of tokens to generate (upper bound). Defaults to 20.<br># (Must be<br>GEN_STEPS=20 \\<br># llvm2scratch requires Python >= 3.12; pin via `--python` to avoid uv picking an older system Python.<br>uv run --python 3.12 --no-project --with-editable ./llvm2scratch python scratch_llama2/build_stories260k_sprite3.py

Outputs:

scratch_llama2/stories260k_inference.sprite3: sprite, blocks hidden (fast editor/import)

scratch_llama2/stories260k_inference_visible.sprite3: sprite, blocks visible (debug)

scratch_llama2/stories260k_inference_visible.sb3: standalone project wrapper around the visible sprite

scratch_llama2/stories260k_inference_visible_scratch.sprite3: Scratch-compatible sprite (no TurboWarp-only blocks)

scratch_llama2/stories260k_inference_visible_scratch.sb3: Scratch-compatible standalone project

Run (TurboWarp)

Sprite workflow:

Import...

scratch_llama2 scratch python llvm2scratch llama2 sprite

Related Articles