GitHub - JeanKaddour/sokoban_speedrun: Teach Qwen3 Sokoban. The fastest recipe wins. · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
JeanKaddour
sokoban_speedrun
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>102 Commits<br>102 Commits
datasets
datasets
records
records
.gitignore
.gitignore
README.md
README.md
assemble_record.sh
assemble_record.sh
eval_speedrun.py
eval_speedrun.py
make_record_report.py
make_record_report.py
modal_app.py
modal_app.py
pyproject.toml
pyproject.toml
speedrun.py
speedrun.py
uv.lock
uv.lock
verify_record.py
verify_record.py
View all files
Repository files navigation
Sokoban Speedrun
Fastest recipe for RL fine-tuning Qwen3-4B-Instruct-2507 from 57% to >80% held-out pass@1 solve-rate on Sokoban puzzles, using a single 8xH100 node.
Play Sokoban if the task is unfamiliar.
World Record History
Record time<br>FLOPs<br>Description<br>Date<br>Log<br>held-out pass@1<br>Contributors
1:27:31<br>1.251 EFLOP<br>GRPO, LR 1.6e-6 annealed, 75 steps<br>2026-06-17<br>records/2026-06-17_01<br>0.891 (CI [0.86, 0.92])<br>@JeanKaddour
Rules
Fastest wall-clock run wins: one training run on one 8xH100 node, measured from training step 1 through final checkpoint write, whose final checkpoint clears the target.
Target: lower 95% bootstrap CI > 0.80 on datasets/sokoban_eval.jsonl.
Eval: 8 completions/puzzle, 12,288 tokens, temperature 0.8, top-p 0.95, seed 12345.
Fixed: model, train set, eval set, reward function, hardware.
Open: RL algorithm, loss, schedules, engine, parallelism, domain-agnostic rewards, prompt.
Not allowed: Sokoban-specific hints, heuristics, or few-shot examples.
Verification: maintainers rerun at a second seed; both runs must clear the target.
Submit
Train, then eval the final checkpoint. Logs, rollouts, source snapshots, and eval JSON are written automatically.
Run python make_record_report.py records/ and fill in the Idea section.
Open a PR adding the record directory plus a leaderboard row. CI runs python verify_record.py records/.
Running the current record
On a local 8xH100 node:
/step_000075">NODE_GPUS=8 torchrun --standalone --nproc_per_node=3 -m speedrun<br>python -m eval_speedrun --eval-checkpoint outputs/run>/step_000075
Modal
modal_app.py rents an 8xH100 box on Modal. Upload the datasets once, then start a run and eval its checkpoint after it finishes:
modal volume put nanochat-rl-hf datasets/sokoban_train.jsonl /datasets/sokoban_train.jsonl<br>modal volume put nanochat-rl-hf datasets/sokoban_eval.jsonl /datasets/sokoban_eval.jsonl<br>modal run --detach modal_app.py<br>EVAL_CHECKPOINT=latest modal run modal_app.py
Credits
Thanks to @joshua-a-harris and his nanoRL speedrun, nanochat, modded-nanoGPT, nanoRL, ScaleRL, and ReasoningGym.
About
Teach Qwen3 Sokoban. The fastest recipe wins.
Resources
Readme
Uh oh!
There was an error while loading. Please reload this page.
Activity
Stars
stars
Watchers
watching
Forks
forks
Report repository
Releases
No releases published
Packages
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
Languages
Python<br>99.3%
Shell<br>0.7%
You can’t perform that action at this time.