Safe Ways to Use AI Agents

At Renuo we started using AI coding agents (like Claude Code, OpenCode or Antigravity) for development, and I also started using them for personal projects like my Raspberry Dashboard. Additionally we started building our own AI agent which is integrated in Redmine, our ticketing and project management system.

During this, we became aware of the security risks involved: By default these agents run with full user permissions: They can read and write files, execute commands, and access credentials on the host system.

Johann Rehberger's 39c3 talk Agentic ProbLLMs: Exploiting AI Computer-Use and Coding Agents shows how prompt injection can lead to remote code execution and credential exfiltration in agents like Claude Code and GitHub Copilot. I recommend watching it.

In this post we'll take a look at these risks and the pragmatic solutions we came up with to keep a balance between developer experience and security.

Risks of AI Agents

LLMs are probabilistic -- a 1% chance of disaster makes it a matter of when, not if. -- Agent Safehouse

Most AI coding agents run with the same permissions as the user who started them. They have access to the file system, can execute arbitrary shell commands, and inherit all credentials available in the environment. Since LLMs are susceptible to prompt injection (malicious instructions hidden in code, documentation, or web content), this creates a real attack surface.

Image source: embracethered.com by Johann Rehberger

The risks boil down to a few categories:

Exposing credentials : Agents have access to environment variables, config files, and credential stores. A prompt injection can trick an agent into exfiltrating API keys or access tokens to an attacker-controlled server.

Malware installation : Agents can be tricked into downloading and executing malicious code, for example through poisoned dependencies or malicious instructions in README files.

Destructive actions on the local machine : An agent might delete files, overwrite configurations, or corrupt a local database -- through prompt injection or simply by making a wrong decision.

Destructive actions on remote systems : Agents often have access to CLI tools that can interact with production infrastructure. Think kubectl delete, terraform destroy, nctl delete, or maybe even simply a database client connected to prod.

I asked some developers what incidents they had happen with AI agents so far. Here are a few examples:

I don't have the session anymore, but while working on a redmine integration, it found out that I had a REDMINE_API_KEY in my ENV variables and started fetching data from our production redmine.

-- Alessandro Rodi

While it wasn't a major issue, it was frustrating when database migration errors caused the development database to be deleted and recreated, as I often lost test-data I wanted to keep.

-- Bruno Costanzo

While I was testing our claude code skill to deploy web-apps to deplo.io, the agent hit the quota limit of the number of apps in the test organization. To solve this it decided it's best to delete existing apps with nctl delete app. It did ask for confirmation though before going ahead.

-- Josua Schmid

We didn't have a case yet where things went seriously wrong, mostly because we don't let the agents run unattended and use test environments. But it was enough to trigger us to really think about how to improve the situation.

For a deeper look at these attack vectors see the 39c3 talk referenced in the introduction.

Mitigation Strategies

So what can we do about this? It boils down to the following strategies:

Hope : Instruct the agent not to do destructive things.

Manual approval : Configure the agents to ask before everything.

Agent specific configuration : Disallow the agent to read certain files or execute certain commands.

Isolation : Run the agents in VMs, Docker containers or a sandboxing tool.

Hope / Prompt Begging

The major issue with just asking the LLM not to do destructive things via prompting is that it may just not work.

Image source: agent-safehouse.dev

Manual approval

While manually approving everything the agent does sounds secure, in practice it leads to approval fatigue : repeatedly approving actions causes us to pay less attention to what we're actually approving.

It also kills productivity: Constant interruptions prevent agents from running in the background.

There is also the issue of over-permissive allowing: At one point while trying out Antigravity, I accidentally allowed executing every bash command instead of only the one it had requested. Since the agent then just continued executing stuff, I needed to stop it.

Agent specific configuration

Most agents can be configured to allow and deny patterns of actions. Claude Code's permission system for example allows you to pattern match shell commands:

"permissions": { "allow": [ "Bash(git commit *)" ], "deny": [ "Bash(git push *)"

This will allow git commit but block...

Safe Ways to Use AI Agents

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars