Vector memory database remembers everything. That's the issue

Your vector memory database remembers everything. That’s exactly the issue. | by Vektor Memory | Jun, 2026 | MediumSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

Your vector memory database remembers everything. That’s exactly the issue.

Vektor Memory

9 min read· Just now

Listen

Press enter or click to view image in full size

There is a design assumption baked into almost every vector database and AI memory implementation that sounds reasonable until you watch it grow nodes in production: that remembering more is always better. Through testing and refining our AUDN code, that is not exactly correct. After running VEKTOR Slipstream against real development sessions for 99 days, the database held 1,413 stored memories across four namespaces. Looking at the importance score distribution, 83 percent of those memories sat below 0.25 out of 1.0, what the system considers the noise floor. The remaining 17 percent, just 60 memories out of 1,413, sat above 0.75 and dominated every recall result. This is exactly what a curation layer is supposed to produce. Those 1,154 low-scored memories are accurate. They are not deleted. They are retrievable by direct query. What they are not is important enough to compete with the 60 high-signal entries every time the agent needs context. AUDN penalised them gradually over hundreds of writes because similar, more specific, or more frequently reinforced memories covered the same ground better. The system created a hierarchy. Without curation, all 1,413 memories would compete equally for every recall slot — and the agent would consistently surface redundant, lower-value context alongside the things that actually matter. That is what standard vector memory looks like without a curation layer. A slow, invisible degradation that nobody notices until the agent starts confidently giving you answers that are three months out of date. Every memory node in Vektor carries an importance score between 0 & 1. When a memory is first stored, it receives a score based on the content’s estimated significance. That score is not fixed. Every time a new memory arrives that is semantically related but not directly contradictory, the compatible verdict for that existing memory takes a small redundancy penalty. The penalty is intentionally modest: a factor based on how similar the incoming content is, typically reducing the score by 10 to 15 percent per occurrence. But across hundreds of sessions, the effect compounds. A memory about project tooling that gets reinforced by similar writes across a dozen conversations will have its score driven down steadily until it sits below the noise floor threshold where it no longer competes in active recall. The noise floor is not a bin for broken or wrong memories. It is where memories go when the system has determined they are not the most important version of what they represent. They are still stored and still retrievable by direct query. They stop dominating recall alongside the 60 high-signal entries that floated to the top of the distribution. This is the intended behavior: a natural hierarchy where what matters most surfaces first, and everything else remains available without contributing noise to every retrieval. Press enter or click to view image in full size

The Mechanism Nobody Talks About Vector databases are extraordinarily good at one thing: storing info and finding information that is semantically similar to other things. That is genuinely useful but is not the best method currently available. When a user tells your agent “I work in finance” in January, and “I left banking last month” in April, a vector store dutifully records both facts. The embeddings sit close together in the vector space because they are about the same topic. When you query for professional context in May, you get both back. The agent receives two conflicting truths with no metadata to tell it which one is current, and it does what language models do when given ambiguous context: it synthesises a plausible-sounding answer that may or may not reflect reality. This is not a retrieval problem. You cannot fix it at recall time by adding better filters or smarter reranking, because by the time you are querying, the contradiction is already in the graph and competing for attention. The only place to fix it is at the write layer, before the conflicting fact is committed. This is the insight that drove the architecture of the AUDN gate. Belwo is a real production at work, semantic, causal, temporal, and entity nodes in formation. Press enter or click to view image in full size

Production graph, temporal nodes only

What a Write-Layer Curation Gate Actually Does AUDN runs synchronously on every single memory write before anything touches the database. Every incoming piece of information is compared against the 200 most recent active memories using cosine similarity, which is a pure SQLite operation that completes in under two...

Vector memory database remembers everything. That's the issue

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs