Skip to content
Back to The Distillation

Claude Mythos is too dangerous to release. Here's why your IT team should still be worried.

Tom Hewitson··9 min read
Claude Mythos is too dangerous to release. Here's why your IT team should still be worried.

Yesterday Anthropic published the system card for Claude Mythos Preview, the most capable AI model anyone has ever built. It is substantially better than anything else at coding, reasoning, maths and cybersecurity. On some benchmarks it is not just incrementally better but in a completely different league.

And they have decided not to release it.

That decision alone should tell you something about where we are. This is the first time a major AI lab has published a full system card (a detailed technical report on a model's capabilities and risks) for a model it has chosen to keep behind closed doors. The reason is straightforward: Mythos is so good at finding and exploiting software vulnerabilities that making it widely available would be reckless.

Let me unpack what that actually means, and why every firm that relies on software (which is every firm) needs to pay attention.

What Mythos can actually do

The headline capability is cybersecurity. With minimal human guidance, Mythos can autonomously discover what are known as zero-day vulnerabilities in real software and then develop working exploits for them. A zero-day is a security flaw that nobody knows about yet: not the software vendor, not the security community, nobody. There is no patch and no defence. State-sponsored hackers and criminal groups pay millions of dollars for them on the black market. Mythos can find them by itself, in both publicly available open-source software and proprietary closed-source code, and then write the code needed to take advantage of them.

It has completely saturated Cybench, the standard cyber capabilities benchmark, scoring 100%. But the really telling results are on harder, real-world tests. It was the first AI model to solve a private cyber range (a simulated corporate network designed to test attackers) end-to-end, completing a corporate network attack simulation that would take a human expert over 10 hours. On a benchmark testing its ability to exploit real Firefox vulnerabilities, it succeeded 84% of the time. The previous best model managed 15%.

Those numbers are not incremental improvements. They represent something qualitatively different.

On coding more broadly, Mythos scores 93.9% on SWE-bench Verified (the industry standard for real-world software engineering), up from around 80% for both the previous Claude model and Google's Gemini. But the gap is even more striking on the harder benchmarks. On SWE-bench Pro, which tests against actively maintained repositories with more complex, multi-file changes, Mythos hits 77.8%. Every other model is stuck in the 50s. That is not a marginal upgrade. On the hardest coding problems, Anthropic now has a model that is roughly 35% ahead of anything else.

The security window you need to act on now

Rather than release Mythos publicly, Anthropic has made it available to around 50 trusted partners through something called Project Glasswing, backed by $100 million in usage credits. Companies like Microsoft, Amazon, Apple, CrowdStrike and Palo Alto Networks are using the model to find vulnerabilities in critical software and patch them before anyone else can exploit them.

This creates a window. Right now, the most capable vulnerability-discovery tool ever built is only in the hands of people trying to fix things. That will not last.

The system card itself is a detailed technical document, and anyone with the resources and motivation to try to replicate these capabilities is now poring over it. Open-source labs will be working to recreate what Anthropic has achieved. You could easily imagine state actors wanting to do the same, given the strategic opportunities this kind of capability represents. The techniques are not secret. The model architecture, training approaches and evaluation methods are all described in the paper. What Anthropic has is a head start, not a permanent monopoly.

This means there is a finite period where Mythos is being used defensively, to discover and patch vulnerabilities, before others develop similar capabilities and start using them offensively. Every piece of software that gets patched during this window is one fewer vulnerability that can be exploited later. Every patch you fail to apply is a door left open.

If your firm is not already running regular security updates and staying current on patches, this is the moment to start taking that seriously. Not next quarter. Now.

The uncomfortable trade-off between open and closed AI

Mythos forces us to confront a tension that has been building for years. The fact that Anthropic chose not to release it is, in a real sense, what is protecting us. If these capabilities were freely available, every amateur hacker and state-sponsored hacking group in the world would have access to a tool that can autonomously find and exploit vulnerabilities in production software.

But that protection comes at a price. We are now in a world where one company has a model that is dramatically better than anything else at writing code. Think about what that means. Any business that builds a technology product is potentially at risk of being outcompeted by a company that has access to Mythos-class capabilities. Salesforce, Shopify, any SaaS product you care to name. If Anthropic (or any lab with a similar model) decided to build a competing product, they could do it faster, cheaper and arguably better than almost anyone.

The system card noted that the people who benefit most from Mythos are those who know less about software engineering. Think about what that means in the context of cybersecurity. People with no security expertise can now build software that handles sensitive data, manages infrastructure, connects to APIs. More software gets built, by more people, with less oversight. And more software means more attack surface.

If the only way to keep these capabilities safe is to keep them locked behind a small number of companies, we could easily end up in a world where the three AI labs control large parts of the economy. That is not a comfortable prospect either.

We have been here before

It is worth remembering that this is not the first time an AI capability has provoked this kind of fear.

In 2019, OpenAI built GPT-2, a text generation model, and initially refused to release it because they were worried it would be used to mass-produce fake news and propaganda. They did a staged release over nine months, carefully monitoring for misuse. By the time the full model was public, two graduate students had already replicated it for $50,000 in cloud computing credits. The feared wave of AI-generated disinformation did not materialise at the scale people expected. Not because the risk was imaginary, but because researchers, platforms and policymakers adapted.

When GPT-4 was red-teamed in 2023 (where security experts deliberately try to break and misuse a model before it is released), experts specifically tested whether it could help produce chemical and biological weapons. They found it could reduce research time but was, on its own, an "insufficient condition for proliferation." A subsequent RAND Corporation study went further, finding no statistically significant difference in the viability of biological attack plans generated with and without LLM assistance. The information the model provided was, for the most part, already available on the internet. The mitigations worked. Guardrails were added. The world moved on.

Deepfakes are perhaps the most relevant comparison. It has been possible for a couple of years now to produce video deepfakes that are almost indistinguishable from real footage. The initial panic was that elections would be stolen, that trust in all media would collapse, that nobody would believe anything they saw.

And to be fair, real harm has occurred. A finance worker at Arup transferred $25.6 million after a video call where the CFO and multiple colleagues were all deepfakes. An AI-generated robocall impersonating President Biden was used to suppress voter turnout in New Hampshire. These are serious incidents.

But the civilisation-ending scenarios have not materialised. Since 2022, over 160 US laws targeting deepfakes have been enacted. Detection tools have improved dramatically. People became more sceptical of video evidence. The risk was real, but society adapted faster than anyone expected.

The pattern is consistent. A new capability emerges. People (quite reasonably) panic about the worst-case scenario. And then, through a combination of research, safety measures, regulation and societal adaptation, the worst case does not materialise. Not because the risk was not real, but because people did the work to manage it.

The case for cautious optimism

The most important thing to take from Mythos is that the work of making AI safe actually works, and we need to do more of it.

Consider alignment research (the work of getting AI models to do what we actually want them to do, rather than something unexpected or harmful). A few years ago, there was genuine uncertainty about whether this was even possible. It turns out it is. Anthropic's own system card describes Mythos as the best-aligned model they have ever trained, even as it is also the most capable. That is not an accident. It is the result of years of research.

There are lots of practical things that can be done to reduce risk. Guardrails to intercept bad requests and filter harmful outputs. Red-teaming to stress-test models before release. Responsible disclosure programmes to ensure vulnerabilities are patched before they can be exploited. Classification systems that monitor for misuse. None of these are perfect, but together they make the risk substantially lower.

The worst thing we can do right now is give up. We need to redouble our efforts in the things we know work.

What you can actually do about it

As users and purchasers of AI systems, we have more power than we might think.

The reason this AI race exists in the first place is because firms are competing to buy the best model for the best results. There is enormous economic value in having a better model, and the reason Mythos turned out to be so powerful at cybersecurity is because Anthropic was trying to build a model that was really good at coding. That is where the money is, because software engineers are expensive and there are not enough of them.

If we, as the people buying and deploying these systems, support the labs that are acting ethically and transparently, that are thinking about how to build AI safely for everyone rather than just racing to ship the most capable model, we create the right incentives. If we send clear signals to companies that behave irresponsibly in training and deploying their models, we make that behaviour less attractive.

Researchers and developers are going to have to think carefully about open-source models and how they use them. The onus is on all of us to recognise that AI has created a world where things that were previously impossible or extremely difficult have become easy. That is an enormous amount of power and leverage for anyone who understands it. We all have to ask ourselves whether we use that power for good.

And yes, as citizens, we should be demanding that governments actually regulate this properly. We desperately need politicians and policymakers who understand what is happening and do something meaningful about it. The gap between what AI can do and what regulation covers is growing by the month.

Where this leaves us

Mythos is a milestone. Not because it can do one thing well, but because the gap between what it can do and what everything else can do is so large that it changes the strategic calculus for everyone. The cybersecurity implications are immediate and urgent. The competitive implications are profound and longer-term. The governance questions are overdue.

But none of this is unprecedented. We have navigated moments like this before, and the tools we have built to manage AI risk, from alignment research to red-teaming to responsible deployment, have worked better than most people expected. The task now is to continue that work, invest in it, and make sure we are creating the conditions for it to succeed.

The model is too dangerous to release. The right response is not panic. It is preparation.