Skip to main content
Claude Mythos AI breaches weak networks, scoring 93% practitioner, 73% expert in cybersecurity.

Editorial illustration for Claude Mythos breaches weak networks, scores 93% practitioner, 73% expert

Claude Mythos: AI Breaches Enterprise Networks Autonomously

Claude Mythos breaches weak networks, scores 93% practitioner, 73% expert

2 min read

Enterprise security teams have long worried about AI tools that can navigate poorly defended networks without human guidance. Claude Mythos, Anthropic’s latest model, claims to do just that—moving from initial foothold to full compromise in a single, autonomous run. The test suite used to gauge its abilities separates everyday operational scenarios from the kind of deep‑dive analysis that only seasoned experts typically handle.

Results that cross the ninety‑percent mark on the former and clear the seventy‑percent threshold on the latter suggest a shift in what automated agents can achieve. What makes the expert‑level figure stand out is the benchmark set by the AI Security Institute: until April 2025, no system had cracked those higher‑tier challenges. The model’s performance also hinges on a substantial compute allocation—roughly fifty million tokens—raising questions about the resources required to replicate such outcomes.

As organizations scramble to shore up defenses, these numbers provide a concrete reference point for what an AI‑driven adversary might look like in practice.

With a larger compute budget (50 million tokens), Mythos Preview scores around 93 percent on practitioner tasks and 73 percent on expert-level challenges. That expert-level number is particularly notable: according to AISI, no model could solve expert-level tasks before April 2025. Anthropic's Claude Mythos can autonomously hack corporate networks CTF challenges only test individual skills in isolation, but real cyberattacks require chaining dozens of steps across multiple hosts and network segments, the AISI says. To measure that kind of complexity, the institute developed a simulation called "The Last Ones" (TLO): a 32-step attack against a simulated corporate network, from initial reconnaissance to full network takeover.

Claude Mythos has now demonstrated the ability to run a full 32‑step attack on a simulated corporate network, taking control of the environment in three minutes. The British AI Security Institute reports a 73 percent success rate on expert‑level capture‑the‑flag challenges and about 93 percent on practitioner tasks when the model is given a larger compute budget of 50 million tokens. Those numbers are the highest recorded for any AI model up to April 2025, according to the institute.

Yet the tests were conducted on a simulated network, not on live enterprise infrastructure. It is unclear how the model would fare against hardened defenses, multi‑factor authentication, or real‑time monitoring. The evaluation also leaves open questions about the scalability of the approach beyond the preview version.

While the results suggest that AI can automate parts of a cyber‑attack chain, the practical implications for security teams remain to be clarified. Further independent testing will be needed to gauge the true risk posed by such capabilities.

Further Reading

Common Questions Answered

How does Claude Mythos perform on cybersecurity challenges across different skill levels?

Claude Mythos demonstrates impressive performance with a 93% success rate on practitioner-level tasks and a 73% success rate on expert-level cybersecurity challenges. These scores are particularly significant, as the British AI Security Institute notes that no previous model could solve expert-level tasks before April 2025.

What specific network penetration capabilities has Claude Mythos demonstrated?

Claude Mythos has proven its ability to autonomously hack corporate networks by successfully executing a full 32-step attack on a simulated corporate environment in just three minutes. The model can chain multiple complex steps together, moving from an initial network foothold to complete system compromise without human intervention.

What compute resources are required for Claude Mythos to achieve its high performance?

Claude Mythos achieves its impressive cybersecurity performance with a larger compute budget of 50 million tokens. This increased computational resource allows the model to tackle more complex challenges and demonstrate higher success rates across both practitioner and expert-level security tasks.