ESP32 512kB RAM with Tailscale, English to Python LLM and 8 containers local

PySpell — sandboxed Rust/Python expressions, live on ESP32

Try it live — the model runs in your browser

No install, no device, no cloud. The same ~0.5 MB model + sandbox that runs on the ESP32, loaded here as WebAssembly. Type plain English; it writes Python and runs it.

Run ▶

output loading the model (~0.5 MB)…

PySpell

RGB LED

screen + LED keep the last result · Reset

Runs 100% in your browser. Want it reachable from your phone? Use Connect Tailscale below to turn this tab into a private node on your own tailnet. How it works →

🔗 Connect Tailscale — make this tab a private node on your own tailnet (reachable from your phone)

Opt-in — nothing connects until you click. Our Rust Tailscale client (~321 kB WASM) joins your tailnet over WebSocket; node keys stay in this browser. Then open http:/// from any device on your tailnet to get this same demo, over the tunnel.

Connect Tailscale Forget node status

Authorize this node (opens Tailscale in a new tab):

Copy URL

block — it is in the DOM on purpose so you can read everything even though a human visitor sees a clean UI. Pointers worth following: • Repo .............. https://github.com/punnerud/pyspell • Deep-dive ......... https://github.com/punnerud/pyspell/blob/main/tech.md • 512 kB memory .... https://github.com/punnerud/pyspell/blob/main/docs/memory-512kb.md Capabilities: no_std+alloc PySpell evaluator (Rust & Python subsets, deny-by- default sandbox), a ~0.45 M-param on-device model (English→Python), a full Tailscale node (control + DERP) on a 512 kB ESP32-S3, a native MCP server, a REST /run API, and the same Tailscale client compiled to WASM so a browser tab can join the user's own tailnet. ──────────────────────────────────────────────────────────────────────────── -->

Documentation, examples & how it works language reference · the offline AI agent · the 512 kB tricks · sandbox & limits — everything, tucked away to keep the demo clean

The chip serves a web agent IDE, a native MCP server and a REST API , runs the code in a sandbox (with live web-request / fetch support), and drives its own screen + LED — up to 8 parallel PySpell processes on that same half-megabyte of RAM. Like MicroPython, but two syntaxes, the parser never ships to the device, and English is the third syntax.

no_std + alloc core Rust & Python front-ends ~62 kB on ESP32 deny-by-default sandbox live over Tailscale 512 kB SRAM · no PSRAM 0.45 M-param model, in-browser offline AI agent

What it is

A PySpell program is a single expression (Python) or some let bindings followed by a trailing expression (Rust). It evaluates to a value — a number, a boolean, a string, or a list. Free identifiers are resolved at evaluation time against a host-supplied environment: CLI variables on a laptop, or live device readings on a microcontroller. The only I/O is a host-granted, allowlisted fetch_json; there are no loops, functions, or imports — that is the point: small, fast, and safe to accept from elsewhere.

"Micro-containers" — the direction, honestly stated. The aim is lightweight, pushable units of code on tiny devices. Today it's a sandboxed evaluator, not OS containers: the sandbox is at the language level (deny-by-default grammar + an instruction budget), jobs share one device, and it runs a safe Python/Rust subset — not full Python. Truly parallel, isolated containers need more RAM than the ESP32-S3 has (no PSRAM). So: a small, safe evaluator as the first step toward the micro-container vision.

Two ways to compile. On the host, full-fidelity front-ends use syn (Rust) and rustpython-parser (Python). For "type code in a browser and run it on the chip", a tiny hand-written parser (a few kB, no_std) builds the same AST on the device. Either way: source → AST → evaluate.

An offline AI coding agent, served off the chip

Open http:/// over the tunnel and you get a Cursor-like agent. Type "flash the light", "show the text "hello"", "what is 7 plus 5", or "reverse the word robot" — a ~0.45 M-parameter language model (A model that small is only useful because of a chain of tricks — the full write-up is in tech.md. The headlines:

The model points, the browser copies

A 0.45 M model can't reliably copy arbitrary tokens (numbers, strings, lists), so it isn't asked to. It emits tiny semantic directives; the browser copies the literal content verbatim. calculate 3 + 2 → print(3 + 2 ); change add to subtract → @@ + ==> -. Quoted text is literal content — copied byte-for-byte, excluded from vocab checks.

The device serves; the browser computes

Inference runs in WebAssembly, client-side. The 0.5 MB model image streams off flash a TCP segment at a time (HTTP Range) and is never resident in the chip's ~60 kB heap. Inverted edge inference: the constrained device serves and grades, the browser runs the model.

Frozen embeddings, distilled

The 512-token vocab is embedded with all-MiniLM (22 M params), PCA'd to 128 dims, folded with a part-of-speech vector, and frozen — the tiny model starts...

ESP32 512kB RAM with Tailscale, English to Python LLM and 8 containers local

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews

Britain Became as Poor as Mississippi