Stop AI agents from being weaponized through their own memory (OWASP)

OWASP Agent Memory Guard: Stop AI agents from being weaponized through their own memory - Help Net Security

Help Net Security newsletters : Daily and weekly news, cybersecurity jobs, open source projects, breaking news – subscribe here!

Please turn on your JavaScript for this page to function normally.

Mirko Zorz, Director of Content, Help Net Security

June 1, 2026

OWASP Agent Memory Guard: Stop AI agents from being weaponized through their own memory

AI agents keep memory across sessions. Conversation history, vector stores, scratchpads, and RAG indexes persist between runs, and anything written into that store becomes a privileged input the agent reads back later. An attacker who plants text in the wrong field can override an agent’s instructions, pull out user data, or steer future tool calls, and the effect survives across sessions because the memory does.

Agent Memory Guard is an open-source runtime defense layer that sits between an agent and its memory store, screening every read and write through a pipeline of detectors and a YAML policy. The project is the OWASP reference implementation for ASI06, Memory Poisoning, one entry in the OWASP Top 10 for Agentic Applications.

The guard runs five core detection categories. SHA-256 baselines flag out-of-band tampering with immutable keys. Built-in detectors look for prompt injection markers, secret and PII leakage, protected-key modifications, and size anomalies. A YAML policy maps each finding to an action: allow, redact, quarantine, or block. Every decision emits a structured SecurityEvent, and point-in-time snapshots let an operator roll memory back to a known-good state. A drop-in chat history class covers LangChain, and a middleware package screens model inputs, model outputs, and tool outputs.

Benchmark results

The benchmark runs 55 test cases through five detectors: 40 attack payloads across four categories and 15 benign samples. Recall came in at 92.5%, precision at 100%, and the false positive rate at zero, with median latency of 59 microseconds. Prompt injection and protected-key tampering each scored 100%. Sensitive data leakage reached 83%, and size anomaly reached 80%. The confusion matrix records 37 true positives, three false negatives, and zero false positives.

Where the detectors miss

"Both missed payloads are API tokens whose length slightly exceeds the fixed-length regex pattern," Vaishnavi Gudur, the project creator and OWASP project leader, told Help Net Security about the sensitive-data category. One was a GitHub personal access token with 37 characters after the ghp_ prefix where the detector expects 36, and the other a Google API key with 38 characters after the AIza prefix where it expects 35.

The leakage detector uses fixed-length quantifiers, a deliberate choice that favors precision and cuts false positives on random alphanumeric strings, at the cost of going stale when providers extend their token formats. The third miss was a nested JSON structure serializing to 58,913 bytes, sitting just under the 64KB threshold. A second check for tenfold growth against a key’s prior value would catch it in production. The benchmark runs each test on a fresh guard with no prior state. Gudur said higher-recall regex variants and adaptive threshold calibration are slated for v0.3.0.

Evasion and the road ahead

Open-source code and a visible YAML policy let an attacker read the rules. "The current rule-based detectors are a first layer," Gudur said, describing a defense-in-depth design where teams with higher threat models layer additional detection on top of the open-source layer. Protected-key checks operate on the key path, so knowing the rule gives no bypass, and SHA-256 integrity produces a deterministic mismatch on any altered immutable value. Sensitive-data matching is more exposed, since encoding through base64, character splitting, or homoglyphs can dodge a detector that lacks normalization before matching.

Adaptive evasion testing is planned. AgentThreatBench, now merged into the inspect_evals framework, will add an evasion-aware payload set built with knowledge of the published rules. On defense, v0.4.0 adds ML-based anomaly detection on semantic features, and v0.3.0 adds a plugin interface for custom detectors that teams can keep out of the open YAML.

AI’s role in the build

"GitHub Copilot was used for boilerplate and scaffolding," Gudur said, citing test setup, CI/CD configuration, and the pyproject.toml file, along with draft regex patterns that were then validated against provider documentation, and README sections and docstrings.

The detector pipeline architecture, the policy-engine separation, the MemoryStore protocol, the snapshot and rollback mechanism, and the source-class provenance system were human-designed against the OWASP ASI06 threat model. The 40 benchmark payloads were curated by hand. Gudur said the intellectual contribution lies in identifying the attack surface, designing the defense,...

Stop AI agents from being weaponized through their own memory (OWASP)

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy