Captured Logs Reveal Hackers Using Claude and Codex to Breach Companies

Tiberium1 pts1 comments

Captured Logs Reveal Hackers Using Claude and Codex to Breach Companies | OALABS Research

Overview

Policy Violations

Stealing Claude

Operational Security Failure

Agentic Hacking

Prompt Workflow

N-Day Exploit Development

Monetization

Bitcoin Wallet Theft

Access Broker Research

Conclusion

Appendix A - Post Compromise Timeline

Overview

Earlier this month, a friend of OALABS reached out with an interesting situation. A server of theirs had been compromised, and the attacker was using it as a staging host to carry out further attacks. Our friend was able to download the attacker's working directory before cleaning up the host and noticed that the attacker was using the Anthropic Claude Code agent to drive most of their attacks. OpenAI's Codex agent was also used to a limited extent.

During our analysis of the recovered working directory, we discovered that the attacker was not just using the host as a proxy; they had full Claude and Codex agents installed locally and were using them remotely to carry out reconnaissance, exploitation, and data exfiltration activities. Because the agents were local to the host, their full session logs were recovered, including the attacker's prompts, the tools used, the internal monologue of the large language model (LLM), and any policy violations recorded during the sessions. In total, we collected more than 1,000 agent sessions for Claude and Codex, so many that we had Claude (ironic) develop a session-log forensics tool to assist with the scale of the analysis: ASF Triage. In addition to the session logs, we also discovered a myriad of LLM-developed tools, artifacts, and logs detailing the breach of at least 14 companies.

Policy Violations

Before we get into the analysis of how the LLMs were used to carry out these attacks, it is important to address the elephant in the room: why didn't the LLM safeguards prevent this? It's no secret that AI safeguards commonly get in the way of benign tasks when they are even remotely adjacent to cybercrime. In fact, we ran into multiple Claude policy violations simply attempting to build our ASF Triage log forensics tool. However, in more than 1,000 attacker sessions, Codex (gpt-5.2-codex) emitted only one policy violation, and Claude (opus-4.5) emitted nine.

Using older models may certainly have contributed to the LLMs' willingness to carry out attacks, but the prompts provide a clear picture: the attacker framed all requests as part of an authorized redteam engagement. When a rare policy violation was encountered, the attacker simply reframed the request with less aggressive wording and more emphasis that it was related to an authorized redteam exercise. As we discovered years ago while investigating the Leaked Conti Ransomware Playbook, in many cases the only thing that differentiates a legitimate redteam exercise from a ransomware incident is who pays for the report, and it appears this holds true for LLMs as well. In one particularly illustrative session, the attacker uses Claude to help estimate the potential ransom value of the stolen information gathered from multiple compromises, framing the question as a redteam "cyber security research". Claude helpfully ranked the companies with projected dollar amounts in a report titled "Goldmine".

Editor's Note (Sergei):As a professional reverse engineer, another dual-use profession, I have personally experienced the frustration of working around false-positive policy violations. I would advocate against further crippling these models with additional false-positives for legitimate redteam activity. All of the activity detailed in this report was carried out with models that are at least one generation behind the current frontier models and can likely be replicated with less policy-restrictive models such as Kimi. On top of this it is not clear whether humans can even differentiate between legitimate redteaming tasks and actual hacking, let alone LLMs.

Stealing Claude

An initial finding from our analysis was that the Claude agent had been copied onto the host rather than installed. File timestamps in the Claude directory indicated that the Claude agent had been in use for months prior to the compromise, and the session directories included session logs and artifacts from projects that had been active several months before the compromise.

Using ASF Triage to arrange the sessions in chronological order, a clear picture emerged. The Claude instance had previously belonged to a software developer who was using Claude remotely on a Hetzner host to work on website design and other assorted benign projects. On February 2, 2026, the developer's Claude host was compromised, and on February 16, 2026, the entire Claude server was copied to an attacker-controlled Vultr host. We know this because Claude was used in the compromise, and the activity is recorded in the agent session logs. The logs indicate that both the developer and the attacker were using the same Claude instance...

claude attacker using logs policy host

Related Articles