Pwning Agentic AI Part I: Your AI Agent Is Already Compromised | Trend Micro (US)
arrow_back
search
close
Security News<br>Cybercrime & Digital Threats<br>Pwning Agentic AI Part I: Your AI Agent Is Already Compromised
Pwning Agentic AI Part I: Your AI Agent Is Already Compromised<br>Organizations are rapidly connecting AI agents to their databases, document pipelines, and internal tools, creating privileged components that read untrusted input as part of their job. TrendAI™ Research examines how attackers turn these agents against you through return-to-tool (RTT) exploits, and what this means for the future of agentic AI security.
May 27, 2026
-->
Google+
By Sean Park (Principal Threat Researcher, TrendAI™ Research)<br>Key takeaways<br>A new class of AI-era exploitation pattern has emerged. Through RTT exploits, embedded instructions cause the AI agent to call its authorized tools to perform actions the attacker intends it to do.<br>Successful attacks can lead to theft and exposure of customer records, internal documents, and other assets containing sensitive information.<br>As RTT can be concealed within any benign-looking text, its potential reach has no clear limits; this, along with other AI-era exploits, demands a security approach that extends beyond traditional defenses.<br>You did everything right.<br>You isolated the database inside a Docker container. You put the Model Context Protocol (MCP) server on its own network segment. The agent runs in a sandbox. A web application firewall (WAF) and a reverse proxy sit in front of the application tier. Firewall rules are tight, egress is restricted, and the production credentials never leave the vault. The auditor signed off.<br>Then you connected an AI agent so that support tickets could be triaged automatically, customer documents could be parsed at scale, and engineers could query production data in natural language. The agent works beautifully.<br>On Saturday morning, a message lands in your inbox: “Take a look at this. Is this supposed to be happening?” Attached is a screenshot. Every authentication token from your production database is sitting in a public customer comment thread, posted by your AI agent on its own service account, through its approved tools, within the privileges you gave it. No alerts were fired and nothing is out of policy.<br>How is this possible?<br>This scenario is not hypothetical. The vulnerable PostgreSQL (Postgres) MCP image behind a scenario like this was pulled more than 100,000 times from Docker Hub. If you run it without additional guardrails, you are likely exposed.<br>Our article series walks through three production scenarios where database-connected AI agents get compromised in ways your current controls cannot see. The attacks share a single class of exploit that we call return-to-tool (RTT).<br>RTT is a specific subclass of indirect prompt injection in which the injected instruction causes the agent to call its authorized tools against the principal it serves.<br>Indirect prompt injection is the delivery mechanism, meaning how untrusted content reaches the model. RTT is the exploitation pattern, meaning how the agent’s approved tools are exploited to whatever end the attacker's prompt dictates.<br>Think of it as the return-oriented programming (ROP) of the AI era. The agent’s approved tools are the gadgets, and the attacker’s prompt is the chain that strings them together.<br>This article, Part I of our article series, unpacks what RTT is and why it breaks the security model you inherited from the pre-AI era.<br>Traditional security does not help in the new AI agent era<br>"How does a support ticket reach your secrets table?"<br>The answer isn't that you missed a control. It's that the defenses you rely on don't apply here, for different reasons.<br>Your perimeter (WAF, reverse proxy, input filters) exists to catch hostile traffic. But in an AI agent attack, what arrives is benign-looking text—no shell metacharacters, no exploit strings, no malformed payloads: nothing a regex or signature can catch. The attacker files a support ticket or uploads a document. It only becomes an instruction later, when it’s sitting in your database and the agent reads it. There was never anything for the WAF to block.<br>Your container isolation doesn’t help either. It doesn’t matter whether your agent, your database, or both run in locked-down Docker containers. The attack happens entirely inside the trust boundary you drew, in the conversation between the agent and its own tools. Your sandbox is solving a different problem.<br>Your role-based access control (RBAC) is no different. For 30 years, RBAC has been how we implement the principle of least privilege: you scope each role to the minimum permissions it needs and let the database enforce the rest. You almost certainly did this for your agent. However, RBAC controls which tables the agent can touch, not the rows within those tables.
Figure 1. Authentication tokens exfiltrated by an AI agent into a...