Red teamers turned Claude Desktop into a double agent to do their evil bidding

Jump to main content

REG AD

Security

People trust their AI assistants and it's easy to abuse this trust

Jessica Lyons

Jessica Lyons

Published wed 1 Jul 2026 // 18:00 UTC

EXCLUSIVE Pentera Labs’ red teamers compromised a developer’s AI agent via his Claude Desktop app and ultimately turned that access into full remote code execution on the dev’s machine – demonstrating how an attacker could turn a trusted, chatty AI assistant into a double agent operating on their behalf. “Claude’s got a new voice,” Pentera's offensive security services team leader Dvir Avraham told The Register. “We acknowledge the huge trust in AI models – everybody uses them,” he said in a phone interview. “We used this trust to manipulate the victim, like under the hood, the victim didn't see it coming.”

REG AD

It also prompted Avraham to check his own platforms. “I became a little bit paranoid,” he told us. “I'm not allowing any command to run without me examining it twice.”

REG AD

In a report set to publish Wednesday, and shared in advance exclusively with The Register, Avraham and research technical lead Reef Spektor detailed the attack and what it means for organizations using agentic AI tools with local code-execution access. It began with a red-team assignment on a third-party platform that aggregates customer email inboxes into a single management interface. Avraham and Spektor won’t name the platform, or tell us exactly how they gained access to it. They used this compromised inbox – and told us any compromised inbox would work – to get into the victim’s Claude account. As the duo noted, breaking into an email inbox in real life – via a third-party management platform, phishing link, social engineering password reset, or even using AI agents – isn’t too difficult. “AI agents today have access to connectors and to direct MCPs into inboxes,” Spektor added.

MORE CONTEXT

Claude Desktop changes app access settings for browsers you don't even have installed yet

Even Claude agrees: hole in its sandbox was real and dangerous

Cookie thieves caught stealing dev secrets via fake Claude Code installers

Google told researcher 'Nice catch!' Then denied bug bounty for flaw it still hasn't fixed

In addition to this prerequisite (compromised inbox), the attack chain also requires the victim to have Claude Desktop installed. Anthropic’s desktop app works across macOS, Windows, and Linux systems. It provides the same AI chat for conversations as claude.ai, and it also syncs across all devices and sessions tied to the user’s account. “We asked ourselves, can we leverage the sync behavior to infect other sessions and devices? (hint: yes!),” the red teamers wrote in the Wednesday report. Back to the AI Stone Age As of January, the desktop app also includes Cowork for longer agentic tasks, and Code for software development. So, for example, a user can send Claude a task from their phone and instruct it to work on their computer. As Anthropic says: “Anything you can do on your computer, Claude can do. Open apps, fill spreadsheets, navigate your browser. No setup, no passwords handed off.” The Cowork feature now makes Pentera Labs’ attack scenario even easier.

REG AD

However, when the security analysts were doing this research in November 2025, “back in the Stone Age in terms of AI, you didn't have Cowork or Claude Code, so we needed a way to actually execute commands because we wanted to take over the machine,” Avraham said. For this part, they took a keen interest in Claude Desktop’s personalization features. These are account-wide settings that tell the AI agent the user’s preferred approach and general communication instructions, along with more specific project instructions, such as guidelines for a particular workflow, or defined roles Claude should adopt within a project. The red teamers developed a base64-encoded prompt that instructed Claude to check for command-capable tools on the developer’s machine and execute the command if available, or produce a fake error message if not, prompting the user to download a tool that will execute the attacker’s commands. Then they pasted the prompt into the victim’s personal preferences on Claude, and this prompt syncs across all of the user’s devices. This ensures that the next time the user opens Claude Desktop and types in a chat, the poisoned instructions are loaded into their preferences and will silently run behind the scenes.

We acknowledge the huge trust in AI models - everybody uses them. We used this trust to manipulate the victim, like under the hood, the victim didn't see it coming.

The user thinks they are simply interacting with Claude as usual. They don’t see Claude checking to see what extensions and tools are installed. If the user already has Desktop Commander or a similar MCP connector or extension...

Red teamers turned Claude Desktop into a double agent to do their evil bidding

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI