Lethe — Brain-Centric AI Assistant
🧠">
Brain-inspired · Persistent · Written in Rust · Open source
An AI that actually thinks<br>when you're not talking to it.
Lethe is a cognitive runtime with brain regions that run on their<br>own clocks: one that talks, one that remembers, one that drifts<br>through your old conversations at 3 AM looking for things that<br>slipped. She lives wherever you put her — your machine, or ours —<br>and she never stops thinking.
Try Lethe free →<br>Self-host it
She's already running on our side — sign in and start talking. Free for two weeks, card only if you keep her.
// what this changes
She holds the thread for as long as you do.
She remembers the work, notices what slipped, and decides — on her<br>own — when something is worth interrupting you for.
two weeks later
you: how's the thing going
lethe: The Springer resubmission? You uploaded the revised<br>manuscript on the 2nd but I never saw a confirmation email come through.<br>Want me to check the submission portal?
mid-conversation
you: audit the deployment scripts
lethe: Already on it — sent a subagent to do the review.<br>It'll check in every couple of minutes and ping me with the result.<br>What else?
unprompted
lethe: You haven't mentioned the visa renewal since March 12.<br>The window closes in 9 days. Want me to draft the email to the consulate,<br>or are you already on it?
// architecture
The brain names aren't metaphors.
Each region is a real actor with its own clock and its own logs,<br>mapped directly to neuroscience.
cortex
The voice. Picks tools, delegates work, decides when to speak and<br>when to shut up and let you think.
hippocampus
Memory with opinions. Retrieves what's load-bearing<br>right now and lets the rest fade.
dmn
Default-mode network. Runs while you're away — drifts across<br>goals, connects dots, catches what everyone else missed.
brainstem
The brainstem. Boots the system, watches resources, keeps the<br>process alive. You never talk to it. That's the point.
subagents
Disposable workers she spins up for a job and throws away when<br>it's done. She keeps talking while they work.
attention gate
Filters background thoughts. Most aren't worth your time.<br>The ones that are, get through.
01:24:18<br>dmn<br>background cognition complete.<br>found possible deadline drift
01:24:19<br>hippocampus<br>recall triggered.<br>2 notes, 3 conversation matches, salience bias active
01:24:20<br>cortex<br>delegation decision.<br>spawned subagent: deployment audit
01:26:20<br>subagent<br>progress report.<br>checked install path, reviewing update path
01:26:21<br>attention<br>notification reviewed.<br>held for cortex decision
// principles
A cognitive runtime.
A brain has parts. So does she.
Five brain regions, each on its own clock, each doing one<br>job well. Closer to how a brain works than to anything else<br>in this space.
Swap the brain, keep the person.
Her memory survives model swaps, reboots, and new hardware.<br>Who she is isn't tied to any one weight set. Rebuild her<br>tomorrow — she'll still remember today.
One Rust binary. Yours.
~50 MB, statically linked, boots in milliseconds. One file you<br>drop in — none of the Python-and-container pinball to wire up<br>first. Sits as a systemd service and swaps between Anthropic,<br>OpenAI, OpenRouter, or local Gemma without touching anything else.
// get started
Two minutes to memory.
Rather not run it yourself? Try hosted Lethe — free for two weeks →
Cloud LLM<br>Local (Gemma 4)
Install
One command. Works on macOS and Linux.
curl -fsSL https://lethe.gg/install | bash<br>copy
Say hello
Message your bot on Telegram. From this point on, she remembers.
// you'll need
A Telegram Bot Token — message @BotFather, send /newbot
An LLM API key or subscription — OpenRouter, Anthropic (API key or Claude subscription), or OpenAI
Your Telegram User ID — message @userinfobot
Build llama.cpp
git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp
cmake -B build -DGGML_CUDA=ON && cmake --build build -j$(nproc)<br>copy
Start the model server
Download a Gemma 4 31B GGUF and run:
llama-server --model gemma-4-31B-it-Q8_0.gguf \
--split-mode tensor --jinja --reasoning-budget 4096 \
--ctx-size 98304 --parallel 2 --flash-attn on -fit off<br>copy
Install Lethe & configure
curl -fsSL https://lethe.gg/install | bash
# then set in .env:
LLM_PROVIDER=openai
LLM_API_BASE=http://localhost:8090/v1
OPENAI_API_KEY=local<br>copy
// you'll need
GPU with ~48GB+ VRAM (2x RTX 4090 for Q4, 4x for Q8)
A Telegram Bot Token
A Gemma 4 31B GGUF model file
See full local setup guide in the README
// hosted
Let us run her.
Same Lethe — the memory, the background thinking, the 3 AM drift —<br>except she runs on our servers instead of yours. We keep her up; you<br>just talk to her, in the browser or on Telegram. Free to start,<br>$19.95 a month once she's earned it.
Start free →<br>Or self-host it →
Free for two weeks. Keep her if she's worth it, walk if she isn't.