Why memory is not enough | Lore Back to blog Ask anyone building agents what “memory” means and you’ll get the same answer: a place to<br>store facts and decisions so the agent can retrieve them later. A vector DB, a knowledge<br>graph, a directory of notes, take your pick. It remembers what you discussed last week.<br>That part is real, it’s useful, and there are now solid tools that do it well.
Now ask a different question. What happens when the session you’re in right now crosses<br>180K tokens and the agent starts forgetting how it began?
The answer you tend to get is a list of chores.
Spin up a background agent so the heavy work stays off your main thread. Write the plan to<br>a file. Then a second plan, and a third, stacked on top of each other. Open a trail of<br>GitHub issues. Leave notes in scratch markdown files. Keep a tidy AGENTS.md, and prompt<br>more carefully while you’re at it. Every one of these is the same move: manually push state<br>out of the window and hope it finds its way back when it matters. That isn’t managing<br>your context. It’s you doing the filing.
Maybe your tools handle that filing for you now, pulling it straight from the conversation<br>so you never lift a finger. Better, but capture was never the hard part. What gets written<br>down isn’t the question. What stays in the window this turn is, and saving it off to a<br>store, by hand or automatically, doesn’t decide that.
And when the filing isn’t enough, when the window actually fills mid-task, the one<br>automatic mechanism every major agent ships finally kicks in: compaction. The client<br>summarizes the older turns into a lossy blob, drops the originals from the window, and hands<br>you back an agent that was a genius a minute ago and now can’t quite remember its own name.
This is the fix we all quietly accepted, or got talked into, for something that happens<br>every single session. Often more than once in the same one. Sit with that for a second:<br>the most predictable failure in agentic coding, the one you can set your watch by, and the<br>state of the art is a guillotine that drops the moment you fall behind. Nobody set out to<br>confuse context management with memory. We just decided the window was your problem to<br>babysit, and moved on. So why has nobody built the thing that actually keeps up?
Two problems hiding behind one word
What you sayWhat it actually needs”What did we decide about auth last week?”A long-term memory store.”Wait, what was the other thing you said we should do after this?”Active context-window management.<br>These are not the same layer. A long-term store sits beside your conversation, like a<br>notebook you keep open on the desk. It only holds what you bothered to write down, and it<br>only helps when you reach for it. That’s exactly right for what carries across sessions: a<br>decision, a preference, a constraint from days ago. This part has had real product<br>attention, and it shows.
The live window is the other half, and it’s the one you’re left to manage by hand, mid-task.<br>People do this well, but it’s a tax: every note you set down pulls your focus off the actual<br>problem, and you have to remember to pick it back up later. And every tool for it is the<br>same shape: static (a markdown file the model may or may not read), offline (an indexer you<br>run between sessions), or just advice (“prompt better”). None of it is in the loop at the one<br>moment that matters, when the window is overflowing while you work. The one mechanism that<br>does fire on its own is compaction.
A store on the sidelines can’t intervene
That compaction step is triage with a blunt instrument: it runs whether or not it’s about to<br>drop something you still need, with no idea what’s worth keeping. Now bolt the best long-term<br>memory store on the planet onto the same session. What changes? Nothing. The store can answer<br>a question if you ask it, but compaction doesn’t ask questions. It just runs. The store<br>never gets a vote on what survives. So you can have flawless recall of last week and still<br>watch the agent get amnesia at 200K tokens.
Some tools go further than a hand-fed store: they keep the entire conversation and let you<br>search over it. That’s genuinely better, and it’s worth saying so. But searching is<br>something you have to do, after you’ve already noticed something went missing, and<br>whatever you pull back lands in the same window that was overflowing in the first place.<br>You found the needle, and the haystack is still on fire. And automating the search doesn’t<br>save it: async, agent-native retrieval still drops what it finds into the same window, and<br>still never decides what leaves it.
There’s a quieter cost on top of that: models use what’s already in the window far more<br>reliably than what they have to go fetch. Hand a model the relevant text in-context and it<br>beats retrieving the same text on quality<br>(Li et al.), and what it does hold, it reads best at the<br>front and back of the window, not buried in the middle<br>(Lost in the Middle). Retrieval does have one honest<br>advantage worth...