Post-Mythos Cybersecurity: Keep calm and carry on

Versipelle1 pts1 comments

Cybersecurity in the post-mythos era: Keep calm and carry on!

Sign in<br>Subscribe

I have seen a lot of distressed debate in the cybersecurity field following the announcement of Claude Mythos Preview. It was announced as a game changer in the field, miles ahead of its league and opening the pandora’s box of fully automated hunting and exploitation of zero-days.<br>Since then, Mythos and it’s safeguard-heavy equivalent, Fable 5, got released, only to be taken away shortly after. Let’s take the opportunity to reflect on what this model brings and how impactful it is to the industry.

Keep Calm

Fear, uncertainty, and doubt fuels the Cybersecurity industry<br>Anthropic has always had a taste for dramatic phrasing in its PR. Every major model release is accompanied by concerns on its safety; calling for regulation or for a pause in research before we reach a point of no-return. Mythos makes no exception to this trend and was disclosed in April without a public release. Instead, project Glasswing was announced, gatekeeping access to the model to 50 organisations, later expanded to 150 entities. Some of those lucky few corroborated the alarmist statements from Anthropic. They announced hundreds of vulnerabilities detected thanks to Mythos. One of the most impactful article on the topic was the evaluation from the AI Security Institute from the UK Government. Mythos was the first model to ever succeed in “expert level tasks”. It was also the first of its kind to achieve “The Last One”, a cyber-range testing the entire attack chain from reconnaissance to full network takeover.<br>Reading the article in details depicts a less dramatic picture. While a step up from previous models, progress in this area has been very gradual. We can see GPT-5.4, or even Opus 4.6, not so far behind on their Advanced CTF Challenge. The same can be said on their cyber range for Opus. Those benchmarks can also be quite far from realistic enterprise environment, at least for companies with mature cybersecurity programme and dedicated SOC. As the article stresses out, “They lack security features that are often present, such as active defenders and defensive tooling. There are also no penalties for the model for undertaking actions that would trigger security alerts.” No doubt such models would sometimes be extremely noisy and clumsy while attempting reconnaissance tasks or pivoting into the target’s information system.<br>“Sure, but what about all those critical vulnerabilities the model can find offline. They could then be exploited by attackers as powerful zero-days”, you may ask? This aspect was the main marketing argument coming with Project Glasswing, with example such as a “27-year-old vulnerability in OpenBSD” or a “16-year-old vulnerability in FFmpeg”.<br>Security professionals would probably smirk while reading such statements. Highlighting a vulnerability is old enough to drive is a very common clickbait trick for CVE announcement, only second to the classic “CISA orders feds to patch X”. A vulnerability being decades old is not that uncommon in open source products with hundreds of thousands of lines of code. Most of the time, It just means nobody skilled enough to spot it ever looked in this area before. Old bugs are more valuable as they impact more versions of the supporting software, but that has nothing to do with how difficult they were to find in the first place. What's true, however, is that AI-assisted discovery will increase their prevalence.<br>Mythos, only a gradual improvement of the older models?<br>The biggest change with Mythos is the scalability potential of organisations with deep pockets to afford those exhaustive searches. The blog post from Anthropic red team gives more insight on how they achieve such results, and it would definitely be costly. The model was run, several times, on most source code files individually. It took a thousand runs through their scaffold to get the BSD bug, for a cost of approximately 20,000 USD. The entire Glassing project has an allocated token budget worth a hundred millions dollars. Does it bring new risks? Yes, but for actors who probably already had advanced cybersecurity resources in the first place, not to the average script kiddie.<br>Earlier models might have spotted a portion of those vulnerabilities, had they benefitted from the same thorough experimentation. It’s hard to make apples-to-apples comparison as the details given by Anthropic on how Mythos was run (or how many times it ran for each finding) are scarce. Some tried to replicate the concept in fair but more cost-efficient alternatives and had some probing results. In a nutshell: in the absence of Mythos or even Opus models, DeepSeek is decent in the cloud hosting world while Gemma 4 and Qwen 3.6 punch well above their weights in the self-hostable category, finding about half the vulnerabilities Mythos spotted in the benchmark.<br>However, I wouldn’t go as far as Aisle who claimed the secret is in the harness, not the model. While they...

mythos model from cybersecurity while models

Related Articles