AI Is the Best Thing to Happen to Security

ilreb1 pts0 comments

AI Is the Best Thing to Happen to Security | Chandrapal Badshah

/ June 28, 2026 / 6 mins read / By Chandrapal Badshah

AI Is the Best Thing to Happen to Security

LLMs have been around for a while now. When Anthropic released a statement that nation state attackers are using Claude for attacks, I read it with a lot of skepticism.

Back then, I had to beg models to invoke tools the right way. Passing a valid input to achieve function calling tool calling was painful. I couldn’t see how attackers were getting any real value out of these models let alone “autonomously” hacking the world.

But over the last six months, something changed. Tool calling became a solved problem for larger models. Models started producing structured output reliably. And AI labs are pushing the same thinking and tool calling capabilities down to smaller models that can run on your laptop.

That shift was massive for truly autonomous agents in offensive side. Agents can reliably use tools, edit raw HTTP requests, follow source to sink, read outputs, store memory and decide what actions to be taken. The reliability had massive improvements over non-determinism.

The monkey with a button

Let’s slightly deviate and do a thought experiment.

Do you think a monkey can hack websites?

A decade ago, give a monkey a button. Every time the monkey presses it, SQLmap runs against random targets on internet. The monkey could hack into many vulnerable targets, thanks to the tool.

However, the bottleneck was the tool. SQLmap follows a fixed set of checks. Throw in some custom logic - endpoints behind authentication, multiple user roles, heavy JavaScript - and the tool misses it. The monkey keeps pressing the button but nothing happens even if there’s a valid second order SQL injection in the application.

Now replace SQLmap with an AI harness (OpenCode, Codex, etc).

The harness reads the response from the target and the underlying LLM models adapts the course of action. It switches tools. If SQLmap doesn’t work, it tries something else. If it finds there are API endpoints to create users with different roles, it executes curl commands to create them. It chains multiple tools together based on what it sees. The AI doesn’t replace tools like SQLmap - it orchestrates them on the fly.

Yes, AI agents are non-deterministic. Same target, different results each time. Some runs find real issues, some produce garbage. But hit or miss is still better than 100% miss. Over enough button presses, the monkey with the harness finds more vulnerabilities than SQLmap ever could.

The ceiling of what the button could do went up.

Guardrails are a feature, not a guarantee

You might be thinking - don’t these popular LLM models have safety guardrails? Won’t these models just reject your request if you tell “Hack this domain”.

They do reject. Claude refuses harmful requests. So does GPT models and others. Jailbreaking is a cat-and-mouse game that labs keep patching.

But that’s only half the picture.

Capable open-weight models like DeepSeek and GLM can be self hosted. And tools like heretic permanently strip safety and censorship from model weights.

We can use these uncensored open weight models and just make it attack sites without wasting time on convincing that the agent is authorized to attack or you’re the CISO of that target company. The closed weight models might be better than open weight alternatives, but uncensored self hostable models are going to achieve the goal.

Code is cheap

Then there’s the other side of the equation. The stuff being attacked. The stuff that pays security industry to “secure.” The code itself.

When creating code is cheap, a lot gets created at a much faster pace. More APIs. More endpoints. More infrastructure. More attack surface.

It’s not just volume that increases but also the complexity. AI helps teams ship faster, but faster doesn’t mean simpler. Teams pick frameworks and stacks based on LLM recommendations. Especially when you’re learning how to code. Teams with less coding experience ship production systems using AI assistance - changing the underlying threat model without even realizing it. New tech gets adopted, existing systems get entangled, and assumptions pile up.

More surface. More programming languages. More dependencies. More shadow IT. More complexity. More things that can go wrong.

Defenders have the same tools. Not the same game.

Defenders can use AI tools to attack and secure their codebases.

AI-powered SAST, anomaly detection, automated triage, AI code review. Or just run Claude Code in a loop while also providing tooling to determine if a detected issue is actually exploitable or just missing best practice that doesn’t apply to our usecase.

But there are two asymmetries that make this game fundamentally unfair.

Asymmetry #1: attackers aim wins, defenders aim to prevent losses.

An...

rsquo models tools monkey tool sqlmap

Related Articles