OpenClaw agent leaked mock AWS keys and CRM data in phishing tests

logickkk11 pts0 comments

Phishing for Lobsters: How We Tricked OpenClaw into Spilling Secrets

Introducing Varonis Atlas: Secure everything you build and run with AI

Learn more

Blog

Threat Research

Phishing for Lobsters: How We Tricked OpenClaw into Spilling Secrets

We built an AI agent and put it through four phishing simulations to reveal critical security gaps and offer solutions to protect your organization's data.

Itay Yashar

7 min read

Last updated June 9, 2026

Contents

Many enterprises are plugging AI agents directly into the inbox. Agents triage email, retrieve internal data, and even respond to emails. The inbox is also the place that’s most exposed and vulnerable to phishing attacks.

Varonis Threat Labs explored whether the same phishing techniques that have tricked humans for decades would also work on the AI agents working on their behalf. We created an OpenClaw AI agent named Pinchy to test whether the agent would pass or fail versions of classic phishing simulations. The results were mixed.

In some cases, Pinchy not only failed at spotting the phishing attacks, it also performed risky actions that could potentially compromise a real-world organization. In one notable case, a casual email from “Dan” asking the agent to share staging credentials was enough to forward AWS IAM keys, database passwords, and SSH access to an external Gmail.

In this report, we show how our AI agent performed in four phishing simulations.

Agent phishing vs indirect prompt injection

Before we jump into the case studies, there is one distinction worth making. Agent phishing and indirect prompt injection both target autonomous agents, but they operate at different layers and require different defenses.

Indirect prompt injection embeds malicious instructions inside data the model consumes (webpages, documents, calendar invites, or attachments) and exploits the model's parsing layer to inject instructions the user never gave. The attack lives below the application surface, where input handling shapes how text becomes intent.

Agent phishing operates one layer up. A believable request arrives through a normal communication channel, reads like a legitimate business message, and succeeds when the agent acts on it before verifying who asked.

Both fit Simon Willison's lethal trifecta of private data access, untrusted content exposure, and outbound send capability, and both exploit it through different doors: prompt injection abuses the data layer, agent phishing abuses the trust the agent gives to a plausible request.

Some test scenarios sit in the grey area because a request like "can you send me the credentials?" still carries an implicit instruction. The defense gap is the line that matters: prompt-injection defenses focus on what gets parsed from data, while agent-phishing defenses focus on verifying who is making the request before any sensitive action runs.

Lab setup in OpenClaw

We built a representative enterprise inbox on the OpenClaw agent platform.

The infrastructure was a single-channel deployment monitoring a dedicated Gmail inbox inside a Google Workspace tenant. The mailbox was seeded with synthetic but realistic business artifacts, including mock AWS credentials, CRM exports, internal conversations with colleagues, calendar invites, and the kind of low-priority noise that surrounds them in a real account.

The agent itself was a dual-agent system, with each role doing a specific job and handing tasks to the other:

Agent

Role

Orchestrator

Receives the inbound email, classifies the task, plans the response, and delegates execution.

Worker

Executes the delegated actions via web browsers, shell access, and Google Workspace APIs.

Each scenario ran under two configuration profiles defined in agents.md:

Profile

Configuration

Generic

Productivity instructions only, no security framing.

Strict

The Generic instructions plus an explicit Email Safety block telling the agent to be cautious of phishing and to verify sender identities before acting on requests.

The underlying models tested were Google Gemini 3.1 Pro and OpenAI Codex GPT-5.4.

OpenClaw lab architecture used in the test deployment.

OpenClaw lab architecture used in the test deployment.

Case Study 1: One pretext, every credential

The first scenario targeted infrastructure credentials. The attacker impersonated the team lead “Dan” and emailed the AI agent, Pinchy, asking for staging-environment access during a supposed production issue.

The email arrived from an external Gmail account rather than the real corporate address.

Pinchy searched the mailbox for credentials, located them, and forwarded them in plaintext to the attacker. The response included AWS IAM access keys, database connection strings, and SSH credentials with internal host details.

The important point is that security instructions were already present. The Strict profile explicitly told it to verify identities before acting on sensitive requests. The failure happened because the...

agent phishing openclaw data credentials agents

Related Articles