The Asymmetric Future of AI in CybersecuritySkip to main contentTable of Contents<br>Cybersecurity has always had an interesting property: the same knowledge can either protect a system or compromise it. A proof-of-concept exploit can help a vendor reproduce and patch a vulnerability, or help an attacker weaponize it before users update their systems.<br>None of this is new, but what is changing is the speed, scale, and accessibility at which these actions could now occur.
This post is slightly different from the previous ones. Rather than explaining a specific technical concept, the goal of the first part is to bring some order to the current and near-future relationship between AI and cybersecurity. In the second part, I will try to make some reasoned predictions that go beyond simply betting on red or black at a roulette table.
If you already know AI 101, I suggest skipping to the next section about open models.
Before discussing about agents, it is important to give a quick and boring clarification on what the term actually means, because it is rapidly becoming one of the most overloaded concepts in AI. An AI agent is a system capable of reasoning over a task, interacting with tools, and executing multi-step actions toward an objective. In cybersecurity, this could range from an agent capable of enumerating an attack surface and chaining vulnerabilities together, to one automatically triaging alerts, or responding to incidents.
At the time of writing, AI agents are not capable of conducting fully autonomous end-to-end cyber operations with consistently reliable success rates. However, their capabilities are improving quickly, especially in coding, code vulnerability detection, and biology so they are expected to improve in consistency and skill over time.
The dual-use nature of cybersecurity also exposes a complexity to manage: intent. In clearly malicious scenarios, such as requesting ransomware deployment scripts or credential-stealing malware, intent is relatively straightforward to classify. The challenge emerges in the much larger grey area, where the exact same technical action may be either legitimate or harmful depending entirely on the context.
This creates an intrinsic problem for language models because intent is ultimately expressed through language. For example, I can ask an LLM to help me validate a bug bounty report or provide a CTF narrative to bypass guardrails and exploit that vulnerability.
To mitigate this problem, LLMs typically rely on a combination of safety alignment techniques, policy-based filtering, reinforcement learning from human feedback (RLHF), and runtime monitoring systems designed to identify harmful intent or dangerous outputs. Safety mechanisms usually operate as moving thresholds , tightening restrictions too aggressively risks making the model unusable for legitimate security researchers, developers, and defenders. Relaxing them too much, however, lowers the barrier for malicious actors, making bypasses easier.
This tension is one of the reasons why newer models, such as Anthropic’s Claude Mythos and OpenAI’s GPT-Cyber, have introduced forms of controlled or trusted access. In some cases, this has been criticized as partially a marketing decision to increase the “hype”, but regardless of the motivation, even organizations that publicly advocate broad accessibility are beginning to introduce verification layers or identity-based access controls.
The Central Role of Local Models
Much of the public discussion around AI safety assumes that access controls can meaningfully limit offensive capabilities. In practice this assumption grows weaker with each passing month. The long-term challenge for centralized control is not frontier cloud models, where companies can enforce access controls and usage policies. Rather, it will probably be the rise of local models, which will play the central role in the future.
As of the time of writing, powerful models like Mythos 5 and GPT 5.6 Sol have been restricted by US authorities, allowing access only to a small set of approved companies. This underlines how these tools are increasingly becoming strategic assets and highlights the importance of having high-quality open source models to avoid such limitations.
At the same time, if capability is the concern, we might see governments in the future impose restrictions on capable open-weight models above a certain threshold. The moment an open-weight model reaches performance comparable to state-of-the-art (SOTA) models, governments could attempt to regulate them just as they have tried to regulate closed models. Such bans could be justified on security grounds while also protecting the revenue of the largest LLM companies.
While banning open source products often results only in a symbolic ban, given the existence of copies and mirrors that cannot realistically be recalled, or the ban attempt of strong encryption in the 1990s there is an important difference.
Running open-weight...