GitHub - Broyojo/llm_from_scratch · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
Broyojo
llm_from_scratch
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>11 Commits<br>11 Commits
artifacts
artifacts
docs
docs
llama2.c
llama2.c
llvm2scratch
llvm2scratch
patches
patches
scratch_llama2
scratch_llama2
.gitignore
.gitignore
README.md
README.md
View all files
Repository files navigation
LLM From Scratch
Run the smallest llama2.c model (stories260K) inside Scratch/TurboWarp by compiling C inference code to Scratch blocks with llvm2scratch.
If everything is working, the sprite will start generating the familiar opening:<br>Once upon a time, ... (streamed into the speech bubble token-by-token).
Live Demo
Scratch project: https://scratch.mit.edu/projects/1277883263
Credits (Upstream)
This repo vendors two upstream projects in-tree for reproducibility:
llama2.c by Andrej Karpathy (MIT). Source: llama2.c/ and llama2.c/LICENSE.
llvm2scratch by Classfied3D (MIT). Source: llvm2scratch/ and llvm2scratch/LICENSE.
The model/tokenizer artifacts in artifacts/ come from the llama2.c ecosystem.
How It Works
High-level pipeline:
scratch_llama2/build_stories260k_sprite3.py reads:
artifacts/stories260K.bin (the smallest llama2.c checkpoint)
artifacts/tok512.bin (tokenizer vocabulary)
It quantizes the weight matrices to Q8_0 (group size 4) and packs 4 signed int8 values into one u32.
It lays out everything into a single Scratch list !stack:
packed weights + per-group scales
RMSNorm weights
RoPE cos/sin tables (for a reduced SEQ_LEN)
runtime buffers (x/xb/hb/q/att + KV cache)
It writes scratch_llama2/generated_layout.h with 1-indexed addresses into !stack.
It compiles scratch_llama2/llama2_scratch.c to LLVM IR (scratch_llama2/llama2_scratch.ll) using:
clang --target=i386-none-elf (keeps pointers as 32-bit ints)
It runs llvm2scratch to turn LLVM IR into Scratch blocks, then exports .sprite3 and .sb3 outputs.
Runtime UI:
!!output (list) stores generated token IDs.
!!vocab (list) stores token pieces (strings).
!!text (variable) accumulates decoded text; the sprite says it continuously.
!!resets (variable) increments when the compiler triggers a broadcast-based “stack reset” (progress indicator + avoids JS call stack blowups).
!!status (variable) shows a high-level state machine (Edit params... -> Running... -> Done.).
ui_* variables let you adjust sampling/generation settings from TurboWarp/Scratch UI.
Build
Requires:
clang
uv (and Python >= 3.12; llvm2scratch requires it)
Command:
= 3.12; pin via `--python` to avoid uv picking an older system Python.<br>uv run --python 3.12 --no-project --with-editable ./llvm2scratch python scratch_llama2/build_stories260k_sprite3.py"># If you don't have a usable Python yet:<br># uv python install 3.12<br># Optional: tune stack reset frequency for TurboWarp stability/perf.<br># Lower = more stable (less likely to hit "Maximum call stack size exceeded"), but slower.<br># Higher = faster, but can crash in TurboWarp.<br># MAX_BRANCH_RECURSION=200 is the default.<br>MAX_BRANCH_RECURSION=200 \\<br># Optional: number of tokens to generate (upper bound). Defaults to 20.<br># (Must be<br>GEN_STEPS=20 \\<br># llvm2scratch requires Python >= 3.12; pin via `--python` to avoid uv picking an older system Python.<br>uv run --python 3.12 --no-project --with-editable ./llvm2scratch python scratch_llama2/build_stories260k_sprite3.py
Outputs:
scratch_llama2/stories260k_inference.sprite3: sprite, blocks hidden (fast editor/import)
scratch_llama2/stories260k_inference_visible.sprite3: sprite, blocks visible (debug)
scratch_llama2/stories260k_inference_visible.sb3: standalone project wrapper around the visible sprite
scratch_llama2/stories260k_inference_visible_scratch.sprite3: Scratch-compatible sprite (no TurboWarp-only blocks)
scratch_llama2/stories260k_inference_visible_scratch.sb3: Scratch-compatible standalone project
Run (TurboWarp)
Sprite workflow:
Import...