CrankGPT — fully offline, human-powered local AI | CrankGPT
Skip to main content
Link
Menu
Expand
(external link)
Copy
Copied
© 2026 Squeez Labs
CrankGPT is a fully offline and off-the-grid AI box.
Our current demos are variations on voice assistants—turn the crank, say something, get a response—but we’ve generated images (small), made poetry (bad), and written code using the same setup. There’s no battery or cloud. Just a hand crank, a little computer, and a small stack of speech and language models running locally. Provided the electronics are kept dry and at a reasonable temperature, there’s no reason this thing won’t still work in a thousand years.
As will be familiar to anyone who has ever undertaken a hardware project, it took about a week to build a proof of concept and many months of kernel optimizations, board revisions, code refactors, and CAD tweaks to get to a thing that works as we envisioned. This article walks through how we built it: the hardware, the local voice agent stack, and the engineering required to make a conversation feel real on a device this small .
Motivations
For something to have “smarts” currently assumes a wall socket and a data center. CrankGPT is a small argument that neither has to be true.
Local models are private models. Why give away what we don’t have to?
It offended our European small-practical-car sensibilities to see people around us throwing kilowatts and thousands of tokens at tasks small models could accomplish just as well as huge ones, for a fraction of the cost and energy.
Everyone is busy making things bigger. We figured opportunities abound to make things smaller.
Hardware
Single Board Computer
We used a stock Raspberry Pi 5 with 8GB RAM and a cooling fan HAT. There are better performing SBDs for the same price (an Orange Pi with its faster DDR5 RAM is an even better fit for LLM inference as we’ll discuss below), but it’s hard to beat the Pi’s accessibility and software ecosystem. The Pi runs speech recognition, a language model, and text-to-speech locally on CPU (no accelerators).
Audio
We used the KEYESTUDIO ReSpeaker 2-Mic Pi HAT: an all-in-one audio I/O solution for Pi designed specifically for voice assistants. It includes a stereo MEMS mic array and various audio outputs (we used the older version with the WM8960 codec). It sits directly on the Pi’s GPIO headers and has decent far-field mic performance, even within an enclosure.
Power
We chose a cheap off-the-shelf switchable voltage 20W hand-crank generator marketed for emergency USB charging. The Pi normally draws around 1.5A, but when it’s working hard (as it does when doing inference on the CPU), its current requirements can increase substantially, causing the generator voltage to sag below the Pi’s required 4.8V or even, in the case of a momentary 5A spike, to trigger the generator’s internal overcurrent protection and shut off the voltage output entirely, causing the Pi to brown out.
To ensure the Pi sees a steady voltage when the full inference stack kicks in (and to afford crankers a little rest), we built a custom capacitor board to smooth out the generator’s output and act as a short-term (~20 second) power reservoir.
You can feel that load curve through the crank: when LLM inference and speech synthesis run together, the crank gets a lot harder to turn.
Software
Operating system
When you’re cranking, every second counts—the minute or so it takes Raspian to boot up feels like an eternity. DietPi is a minimalistic, stripped-down Debian-based image that prioritizes fast boot time over lots of immediately available default services. It shrank our startup time substantially, and turning off unneeded radio services (Bluetooth, Wi-Fi, etc.) reduced it even further: from Linux boot to a usable userspace in around 3 seconds.
Voice agent
We wrote our own edge voice agent optimized for RPI-class boards. Our motivation for building this from scratch rather than on top of existing frameworks (like e.g. Pipecat): we wanted to understand the system end to end and have as few dependencies as possible. The pipeline is the obvious one, with every stage tuned for minimal latency on CPU:
Automatic Speech Recogntion (ASR) + Voice Activity Detection (VAD)
LLM
Text-to-Speech (TTS)
Speech recognition
Moonshine ASR turned out to be by far the fastest option for CPU-based ASR. It’s slightly less robust in noisy environments (relevant in our scenario with a noisy crank) or on accented speech compared to Whisper-base-sized models or NVIDIA’s FastConformer. But we optimized for low latency given our goal of a real-time voice agent. For endpointing, we use Silero VAD.
Language model(s)
The LLM runs on llama.cpp. Our preferred models are small Liquid AI LFM2 variants (e.g. 350M or 1.2B), along with Gemma 3 in its 1B form.
Raspberry Pi 5 performance measured using llama.cpp (llama-bench with pp512 and tg128, 4 threads each):
model<br>quant<br>memory<br>prefill t/s<br>gen t/s
lfm2.5...