A practical guide to defending your agent memory from attacks

A practical guide to defending your agent memory from attacks. | by Vektor Memory | Jun, 2026 | MediumSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

A practical guide to defending your agent memory from attacks.

Vektor Memory

8 min read· Just now

Listen

From prompt injection, poisoning, and silent exfiltration. Press enter or click to view image in full size

by VEKTOR Memory | 10 min read

In the last piece we looked at the threat landscape from the outside. Researched the attack taxonomy and governance gap. The ten surfaces that make agentic AI a genuinely novel privacy problem. This one goes a level deeper. Not what the problem is, but what you can actually do about it in code, in architecture, and in practice. Specifically: what does a security layer for agent memory actually look like, and what did we learn building one. Most writing on agentic AI security stays at the problem description layer. Here are the attacks. Here is why they work. Here is what percentage of models are vulnerable. That is useful, but it leaves a gap. If you are someone building with agents or thinking seriously about deploying them, the question you actually want answered is: what do I implement, and in what order? The DeepMind AI Agent Traps paper identifies six attack categories. The one that matters most for memory systems is persistent memory corruption, where an attacker plants data into long-term memory that activates as malicious when retrieved in a future context. Demonstrated success rates in research exceed 80% with less than 0.1% data poisoning. That number is worth sitting with. You do not need to corrupt most of the memory. You need to corrupt almost none of it. The implication for anyone building a memory-backed agent is direct: your memory store is an attack surface, and it is probably the one you have thought least about. Press enter or click to view image in full size

Faraday interface — simulating a canary attack vectorThe classical approach to agent security is input sanitisation. Strip the prompt. Validate the schema. Refuse suspicious patterns. This works for simple pipelines, but it fails for agentic systems operating across multiple tools and sessions for one reason: the attack does not arrive at the input layer. It arrives through a web page your agent visited three sessions ago. Through an email attachment that got summarised and stored. Through a tool description from a server you did not write that changed from when you connected today. The threat arrives through the environment, not the prompt. A proxy that sits between your agent and everything it touches is the right architectural response to this. Our solution creates a secure chokepoint where every interaction can be observed, logged, and evaluated before it reaches memory. This is the problem Faraday is designed to solve. Faraday initialises as part of the VEKTOR MCP server. When it starts, it reads your claude_desktop_config.json and spawns every other MCP server listed there as a child process. Your other tools, file systems, databases, APIs, all of them run through Faraday before anything reaches VEKTOR memory. This is the transparent proxy pattern. From Claude’s perspective, nothing changes. The same tools are available. The same calls work. But every tool schema, every tool call, and every response passes through a set of checks before it is actioned or written to memory. There are four layers. L0: Static scan at connect time. When Faraday spawns a server and retrieves its tool list, it scans every tool name, description, and input schema against a signature library before trusting anything. This catches sleeper patterns, known injection signatures, and anything flagged as CRITICAL or HIGH severity. A blocked tool does not get registered. The agent never sees it. Phase C: Tool pinning. The SHA-256 hash of each tool’s schema is stored on first connect. Every subsequent connection recomputes the hash. If it changed, that is a rug-pull: the server’s tool definitions have been mutated since you last connected. Faraday logs the intercept, blocks the tool, and raises an alert. This is the defence against supply chain attacks where a third-party MCP server you depend on gets compromised between sessions. Canary tokens. At session start, Faraday injects canary tokens into memory through faraday-canary.js. These are synthetic facts with specific, trackable signatures. If a canary value appears in an outbound API call, an exfiltration attempt is in progress. The detection does not rely on understanding the attacker's intent. It relies on the token appearing where it should not. Taint propagation. faraday-taint.js tracks labels through the memory graph. If a memory is marked as tainted because it came from a suspicious source, any memory derived from it inherits that taint label. This is not foolproof, but it narrows the blast radius of a poisoning event by making the contamination traceable. Every...

A practical guide to defending your agent memory from attacks

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars