Replacing RAG with a cognitive memory stack in Elixir/OTP

Skynet: Towards Synthetic Neurobiology // 0xcc.reSimple hacker blog :) twitter github Mastodon Mastodon mail link The original idea was a joke. I was looking at LLM loops and thinking about how they map onto Elixir’s actor model — GenServers that receive messages, process them, maybe spawn new processes. The jump from “LLM reasoning step” to “GenServer handling a message” is not that far, and once you see it you can’t unsee it. What if you gave a Soul GenServer access to Code.eval_string? What if the agents could fork themselves, spawn new processes mid-reasoning, grow the supervision tree dynamically? I called it Skynet mostly because I thought it was funny. Then I kept building it, and it stopped being funny and started being interesting. The actual problem If you’ve used LLMs for anything beyond simple Q&A, you’ve hit the amnesia problem. Give a model enough context and it starts to forget things from the beginning of the conversation. Run a long session and by the end it has no reliable idea what happened 30 messages ago. Agents are worse — they run potentially forever, handling new events constantly, and there’s no good answer to “how do you give an agent persistent memory that isn’t just dumping the whole history into the context window every time.” The standard answer is RAG: embed everything, search on query, inject the results. It works, sort of. But it has a specific failure mode: you get fragments. Relevant-ish chunks from the vector database, concatenated into the prompt, with no guarantee they form a coherent picture. The agent might have deep knowledge about something — dozens of experiences that together tell a clear story — but what it gets is five loosely related paragraphs. And there’s another problem. Standard RAG busts the prompt cache on every turn because the retrieved context changes. That means every LLM call is cold, and you pay for it in both latency and cost. I kept thinking about how biological memory actually works, and the architecture that started emerging looked a lot more like neuroscience than like software engineering. The architecture Skynet’s agents — I call them Souls, taken from openclaw but improved — run as long-lived Elixir GenServers. Each Soul has a layered cognitive stack: eight modules, each one solving a specific problem in the memory system, each one inspired by a mechanism from neuroscience. Partly because simulating consciousness is a funny goal to have, and partly because it turns out biological memory has solved most of the problems I was running into, and the solutions translate surprisingly well to code. defmodule Souls.SoulServer do use GenServer, restart: :transient

def start_link(%Soul{} = soul) do GenServer.start_link(__MODULE__, soul, name: via(soul.slug)) end

def send_event(slug, event_type, content, opts \\ %{}) do case GenServer.whereis(via(slug)) do nil -> {:error, :not_running} pid -> GenServer.cast(pid, {:event, event_type, content, opts}) end end

defp via(slug), do: {:via, Registry, {Souls.Registry, slug}} end

A Soul receives events — messages from users, other agents, channel integrations, scheduled heartbeats — processes them through a multi-turn LLM loop with tools, and maintains persistent state across restarts. The restart: :transient means the supervisor will bring it back if it crashes, but won’t restart it if it exits cleanly. What makes Souls interesting isn’t the GenServer wrapper. It’s the memory stack underneath. Here’s what the memory stack looks like at a high level: graph TB subgraph "Short-term (per session)" OBS[observe/4\nraw impression per turn] --> REF[reflect/1\ncompresses N obs → summary] REF --> SUM[consolidated summary\nstable — injected into system prompt] end

subgraph "Medium-term (nightly)" MW[MemoryWorker\n03:00 UTC] --> WEEK[weekly memory digest] end

subgraph "Long-term (persistent)" PRISM[Prism\nRust vector search engine] end

SUM --> MW MW --> PRISM OBS -.->|indexed| PRISM The key insight with the short-term layer is that the consolidated summary is stable between turns. It only changes when the reflector runs. That means the system prompt is the same from turn to turn, the prompt cache stays warm, and LLM calls are cheap. Standard RAG changes the context on every retrieval and blows the cache constantly. This doesn’t. The three-tier structure maps surprisingly cleanly onto Penrose-Hameroff’s Orchestrated Objective Reduction theory — a controversial hypothesis about consciousness arising from quantum coherence in microtubules. The theory proposes three levels: sub-neural quantum substrate, neural firing patterns, and stable conscious experience. Whether or not you buy the quantum consciousness part, the structural model is independently useful: raw unprocessed input → pattern compression → stable world model is the same architecture whether you’re describing neurons or Elixir...

Replacing RAG with a cognitive memory stack in Elixir/OTP

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs