Lethe – Brain-Centric AI Assistant

Lethe — Brain-Centric AI Assistant

🧠">

Brain-inspired · Persistent · Written in Rust · Open source

An AI that actually thinks when you're not talking to it.

Lethe is a cognitive runtime with brain regions that run on their own clocks: one that talks, one that remembers, one that drifts through your old conversations at 3 AM looking for things that slipped. She lives wherever you put her — your machine, or ours — and she never stops thinking.

Try Lethe free → Self-host it

She's already running on our side — sign in and start talking. Free for two weeks, card only if you keep her.

// what this changes

She holds the thread for as long as you do.

She remembers the work, notices what slipped, and decides — on her own — when something is worth interrupting you for.

two weeks later

you: how's the thing going

lethe: The Springer resubmission? You uploaded the revised manuscript on the 2nd but I never saw a confirmation email come through. Want me to check the submission portal?

mid-conversation

you: audit the deployment scripts

lethe: Already on it — sent a subagent to do the review. It'll check in every couple of minutes and ping me with the result. What else?

unprompted

lethe: You haven't mentioned the visa renewal since March 12. The window closes in 9 days. Want me to draft the email to the consulate, or are you already on it?

// architecture

The brain names aren't metaphors.

Each region is a real actor with its own clock and its own logs, mapped directly to neuroscience.

cortex

The voice. Picks tools, delegates work, decides when to speak and when to shut up and let you think.

hippocampus

Memory with opinions. Retrieves what's load-bearing right now and lets the rest fade.

dmn

Default-mode network. Runs while you're away — drifts across goals, connects dots, catches what everyone else missed.

brainstem

The brainstem. Boots the system, watches resources, keeps the process alive. You never talk to it. That's the point.

subagents

Disposable workers she spins up for a job and throws away when it's done. She keeps talking while they work.

attention gate

Filters background thoughts. Most aren't worth your time. The ones that are, get through.

01:24:18 dmn background cognition complete. found possible deadline drift

01:24:19 hippocampus recall triggered. 2 notes, 3 conversation matches, salience bias active

01:24:20 cortex delegation decision. spawned subagent: deployment audit

01:26:20 subagent progress report. checked install path, reviewing update path

01:26:21 attention notification reviewed. held for cortex decision

// principles

A cognitive runtime.

A brain has parts. So does she.

Five brain regions, each on its own clock, each doing one job well. Closer to how a brain works than to anything else in this space.

Swap the brain, keep the person.

Her memory survives model swaps, reboots, and new hardware. Who she is isn't tied to any one weight set. Rebuild her tomorrow — she'll still remember today.

One Rust binary. Yours.

~50 MB, statically linked, boots in milliseconds. One file you drop in — none of the Python-and-container pinball to wire up first. Sits as a systemd service and swaps between Anthropic, OpenAI, OpenRouter, or local Gemma without touching anything else.

// get started

Two minutes to memory.

Rather not run it yourself? Try hosted Lethe — free for two weeks →

Cloud LLM Local (Gemma 4)

Install

One command. Works on macOS and Linux.

curl -fsSL https://lethe.gg/install | bash copy

Say hello

Message your bot on Telegram. From this point on, she remembers.

// you'll need

A Telegram Bot Token — message @BotFather, send /newbot

An LLM API key or subscription — OpenRouter, Anthropic (API key or Claude subscription), or OpenAI

Your Telegram User ID — message @userinfobot

Build llama.cpp

git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp

cmake -B build -DGGML_CUDA=ON && cmake --build build -j$(nproc) copy

Start the model server

Download a Gemma 4 31B GGUF and run:

llama-server --model gemma-4-31B-it-Q8_0.gguf \

--split-mode tensor --jinja --reasoning-budget 4096 \

--ctx-size 98304 --parallel 2 --flash-attn on -fit off copy

Install Lethe & configure

curl -fsSL https://lethe.gg/install | bash

# then set in .env:

LLM_PROVIDER=openai

LLM_API_BASE=http://localhost:8090/v1

OPENAI_API_KEY=local copy

// you'll need

GPU with ~48GB+ VRAM (2x RTX 4090 for Q4, 4x for Q8)

A Telegram Bot Token

A Gemma 4 31B GGUF model file

See full local setup guide in the README

// hosted

Let us run her.

Same Lethe — the memory, the background thinking, the 3 AM drift — except she runs on our servers instead of yours. We keep her up; you just talk to her, in the browser or on Telegram. Free to start, $19.95 a month once she's earned it.

Start free → Or self-host it →

Free for two weeks. Keep her if she's worth it, walk if she isn't.

Lethe – Brain-Centric AI Assistant

Related Articles

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI