Detailed Analysis
Anthropic's Claude Mythos Preview has become the first AI model to autonomously complete end-to-end attacks against small, weakly defended enterprise networks, according to controlled testing conducted by the UK's AI Safety Institute (AISI). The evaluation centered on "The Last Ones" (TLO), a simulated 32-step corporate network intrusion that typically requires skilled human security professionals approximately 20 hours to execute. Mythos Preview completed this multi-stage scenario autonomously, achieving a 72.4% exploit success rate and building individual exploits for under $2,000 each in less than a day. The model demonstrated a striking range of offensive capabilities, including chaining four vulnerabilities to escape both browser renderer and OS sandboxes, exploiting subtle Linux kernel race conditions and KASLR bypasses for local privilege escalation, and constructing a remote code execution exploit against FreeBSD's NFS server by splitting a 20-gadget ROP chain across multiple network packets. It also independently identified and weaponized CVE-2026-4747, a 17-year-old FreeBSD vulnerability, to achieve full unauthenticated root access to exposed servers.
The significance of these results lies in how sharply they raise the threshold of autonomous offensive capability in AI systems. Previous frontier models had demonstrated narrow assistance with individual exploitation steps, but Mythos Preview's ability to chain disparate vulnerabilities across a realistic corporate network topology — without human guidance at each stage — represents a qualitative leap. AISI's framing is careful, however: the tested environments deliberately lacked active defenders, security monitoring, endpoint detection tools, and any consequences for triggering alarms. The institute explicitly cautions that whether Mythos Preview could succeed against a hardened enterprise network with live incident response teams remains an open and unanswered question. Planned future evaluations in such environments will be critical to understanding the genuine operational risk posed by these capabilities.
The broader context is one of rapidly accelerating AI capability in dual-use domains, where the same reasoning and code-generation skills that make frontier models productive for software engineers also lower the barrier to sophisticated cyberattacks. What once required a team of experienced penetration testers working across days can now be initiated by a capable AI agent with network access and a prompt. This dynamic places considerable pressure on the cybersecurity industry to accelerate defensive tooling, threat modeling, and detection frameworks that account for AI-driven attackers operating at machine speed. The sub-$2,000 per-exploit cost figure is particularly noteworthy: it suggests that the economic gatekeeping function once served by the high cost of skilled exploit development is eroding rapidly.
Anthropic's publication of these findings through its red-teaming infrastructure signals an ongoing commitment to transparency about frontier model risks, a posture consistent with the company's publicly stated safety priorities and its alignment with government safety institutes like AISI. The collaboration also reflects a maturing relationship between AI developers and regulatory bodies, where structured third-party evaluation is becoming a standard component of the pre-deployment process for frontier models. Nevertheless, the Mythos results highlight a fundamental tension that the industry has not resolved: the same model architectures that produce transformative productivity gains are also producing transformative offensive capabilities, and the pace of capability development appears to be outstripping the pace of both defensive adaptation and governance frameworks designed to contain misuse.
Read original article →