Safe Ways to Use AI Agents

rnestler1 pts0 comments

Safe Ways to Use AI Agents

At Renuo we started using AI coding agents (like Claude Code, OpenCode or<br>Antigravity) for development, and I also started using them for personal<br>projects like my Raspberry Dashboard. Additionally we started building our own<br>AI agent which is integrated in Redmine, our ticketing and project<br>management system.

During this, we became aware of the security risks involved: By default these<br>agents run with full user permissions: They can read and write files, execute<br>commands, and access credentials on the host system.

Johann Rehberger's 39c3 talk Agentic ProbLLMs: Exploiting AI Computer-Use<br>and Coding Agents shows how prompt injection can lead to remote<br>code execution and credential exfiltration in agents like Claude Code and<br>GitHub Copilot. I recommend watching it.

In this post we'll take a look at these risks and the pragmatic solutions we<br>came up with to keep a balance between developer experience and security.

Risks of AI Agents

LLMs are probabilistic -- a 1% chance of disaster makes it a matter of when,<br>not if.<br>-- Agent Safehouse

Most AI coding agents run with the same permissions as the user who started<br>them. They have access to the file system, can execute arbitrary shell commands,<br>and inherit all credentials available in the environment. Since LLMs are<br>susceptible to prompt injection (malicious instructions hidden in code,<br>documentation, or web content), this creates a real attack surface.

Image source: embracethered.com by Johann Rehberger

The risks boil down to a few categories:

Exposing credentials : Agents have access to environment variables,<br>config files, and credential stores. A prompt injection can trick an agent<br>into exfiltrating API keys or access tokens to an attacker-controlled<br>server.

Malware installation : Agents can be tricked into downloading and<br>executing malicious code, for example through poisoned dependencies or<br>malicious instructions in README files.

Destructive actions on the local machine : An agent might delete files,<br>overwrite configurations, or corrupt a local database -- through prompt<br>injection or simply by making a wrong decision.

Destructive actions on remote systems : Agents often have access to CLI<br>tools that can interact with production infrastructure. Think kubectl<br>delete, terraform destroy, nctl delete, or maybe even simply a database<br>client connected to prod.

I asked some developers what incidents they had happen with AI agents so far.<br>Here are a few examples:

I don't have the session anymore, but while working on a redmine integration,<br>it found out that I had a REDMINE_API_KEY in my ENV variables and started<br>fetching data from our production redmine.

-- Alessandro Rodi

While it wasn't a major issue, it was frustrating when database migration<br>errors caused the development database to be deleted and recreated, as I<br>often lost test-data I wanted to keep.

-- Bruno Costanzo

While I was testing our claude code skill to deploy web-apps to deplo.io, the<br>agent hit the quota limit of the number of apps in the test organization. To<br>solve this it decided it's best to delete existing apps with nctl delete<br>app. It did ask for confirmation though before going ahead.

-- Josua Schmid

We didn't have a case yet where things went seriously wrong, mostly because we<br>don't let the agents run unattended and use test environments. But it was<br>enough to trigger us to really think about how to improve the situation.

For a deeper look at these attack vectors see the 39c3 talk<br>referenced in the introduction.

Mitigation Strategies

So what can we do about this? It boils down to the following strategies:

Hope : Instruct the agent not to do destructive things.

Manual approval : Configure the agents to ask before everything.

Agent specific configuration : Disallow the agent to read certain files<br>or execute certain commands.

Isolation : Run the agents in VMs, Docker containers or a sandboxing tool.

Hope / Prompt Begging

The major issue with just asking the LLM not to do destructive things via<br>prompting is that it may just not work.

Image source: agent-safehouse.dev

Manual approval

While manually approving everything the agent does sounds secure, in practice it<br>leads to approval fatigue : repeatedly approving actions causes us to pay less<br>attention to what we're actually approving.

It also kills productivity: Constant interruptions prevent agents from running<br>in the background.

There is also the issue of over-permissive allowing: At one point while trying<br>out Antigravity, I accidentally allowed executing every bash command instead<br>of only the one it had requested. Since the agent then just continued executing<br>stuff, I needed to stop it.

Agent specific configuration

Most agents can be configured to allow and deny patterns of actions. Claude<br>Code's permission system for example allows you to pattern match shell commands:

"permissions": {<br>"allow": [<br>"Bash(git commit *)"<br>],<br>"deny": [<br>"Bash(git push *)"

This will allow git commit but block...

agents agent code started files access

Related Articles