Lethe – Brain-Centric AI Assistant

ghgr1 pts0 comments

Lethe — Brain-Centric AI Assistant

🧠">

Brain-inspired · Persistent · Written in Rust · Open source

An AI that actually thinks<br>when you're not talking to it.

Lethe is a cognitive runtime with brain regions that run on their<br>own clocks: one that talks, one that remembers, one that drifts<br>through your old conversations at 3 AM looking for things that<br>slipped. She lives wherever you put her — your machine, or ours —<br>and she never stops thinking.

Try Lethe free →<br>Self-host it

She's already running on our side — sign in and start talking. Free for two weeks, card only if you keep her.

// what this changes

She holds the thread for as long as you do.

She remembers the work, notices what slipped, and decides — on her<br>own — when something is worth interrupting you for.

two weeks later

you: how's the thing going

lethe: The Springer resubmission? You uploaded the revised<br>manuscript on the 2nd but I never saw a confirmation email come through.<br>Want me to check the submission portal?

mid-conversation

you: audit the deployment scripts

lethe: Already on it — sent a subagent to do the review.<br>It'll check in every couple of minutes and ping me with the result.<br>What else?

unprompted

lethe: You haven't mentioned the visa renewal since March 12.<br>The window closes in 9 days. Want me to draft the email to the consulate,<br>or are you already on it?

// architecture

The brain names aren't metaphors.

Each region is a real actor with its own clock and its own logs,<br>mapped directly to neuroscience.

cortex

The voice. Picks tools, delegates work, decides when to speak and<br>when to shut up and let you think.

hippocampus

Memory with opinions. Retrieves what's load-bearing<br>right now and lets the rest fade.

dmn

Default-mode network. Runs while you're away — drifts across<br>goals, connects dots, catches what everyone else missed.

brainstem

The brainstem. Boots the system, watches resources, keeps the<br>process alive. You never talk to it. That's the point.

subagents

Disposable workers she spins up for a job and throws away when<br>it's done. She keeps talking while they work.

attention gate

Filters background thoughts. Most aren't worth your time.<br>The ones that are, get through.

01:24:18<br>dmn<br>background cognition complete.<br>found possible deadline drift

01:24:19<br>hippocampus<br>recall triggered.<br>2 notes, 3 conversation matches, salience bias active

01:24:20<br>cortex<br>delegation decision.<br>spawned subagent: deployment audit

01:26:20<br>subagent<br>progress report.<br>checked install path, reviewing update path

01:26:21<br>attention<br>notification reviewed.<br>held for cortex decision

// principles

A cognitive runtime.

A brain has parts. So does she.

Five brain regions, each on its own clock, each doing one<br>job well. Closer to how a brain works than to anything else<br>in this space.

Swap the brain, keep the person.

Her memory survives model swaps, reboots, and new hardware.<br>Who she is isn't tied to any one weight set. Rebuild her<br>tomorrow — she'll still remember today.

One Rust binary. Yours.

~50 MB, statically linked, boots in milliseconds. One file you<br>drop in — none of the Python-and-container pinball to wire up<br>first. Sits as a systemd service and swaps between Anthropic,<br>OpenAI, OpenRouter, or local Gemma without touching anything else.

// get started

Two minutes to memory.

Rather not run it yourself? Try hosted Lethe — free for two weeks →

Cloud LLM<br>Local (Gemma 4)

Install

One command. Works on macOS and Linux.

curl -fsSL https://lethe.gg/install | bash<br>copy

Say hello

Message your bot on Telegram. From this point on, she remembers.

// you'll need

A Telegram Bot Token — message @BotFather, send /newbot

An LLM API key or subscription — OpenRouter, Anthropic (API key or Claude subscription), or OpenAI

Your Telegram User ID — message @userinfobot

Build llama.cpp

git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp

cmake -B build -DGGML_CUDA=ON && cmake --build build -j$(nproc)<br>copy

Start the model server

Download a Gemma 4 31B GGUF and run:

llama-server --model gemma-4-31B-it-Q8_0.gguf \

--split-mode tensor --jinja --reasoning-budget 4096 \

--ctx-size 98304 --parallel 2 --flash-attn on -fit off<br>copy

Install Lethe & configure

curl -fsSL https://lethe.gg/install | bash

# then set in .env:

LLM_PROVIDER=openai

LLM_API_BASE=http://localhost:8090/v1

OPENAI_API_KEY=local<br>copy

// you'll need

GPU with ~48GB+ VRAM (2x RTX 4090 for Q4, 4x for Q8)

A Telegram Bot Token

A Gemma 4 31B GGUF model file

See full local setup guide in the README

// hosted

Let us run her.

Same Lethe — the memory, the background thinking, the 3 AM drift —<br>except she runs on our servers instead of yours. We keep her up; you<br>just talk to her, in the browser or on Telegram. Free to start,<br>$19.95 a month once she's earned it.

Start free →<br>Or self-host it →

Free for two weeks. Keep her if she's worth it, walk if she isn't.

lethe brain free install gemma telegram

Related Articles