Claude Mythos: Why Anthropic Won’t Release Its New AI Model - Built In

Detailed Analysis

Anthropic has declined to publicly release Claude Mythos Preview, its most technically advanced AI model to date, citing serious concerns about the model's exceptional and emergent cybersecurity capabilities. Internal safety evaluations revealed that Mythos Preview demonstrated an ability to detect and exploit software vulnerabilities at a rate more than five times greater than prior model generations — a capability that arose without any specific training for offensive security tasks. The company determined that the risk of this model falling into the wrong hands outweighed the benefits of a standard commercial release, marking one of the more consequential decisions by a leading AI lab to deliberately withhold a frontier model from public deployment on safety grounds.

In place of Mythos Preview, Anthropic released Claude Opus 4.7, a model that sits below Mythos in raw capability but incorporates dedicated safeguards designed to detect and block high-risk cybersecurity requests, including prohibited hacking use cases. The sequenced release strategy appears deliberate: by deploying Opus 4.7 in real-world conditions first, Anthropic can stress-test its protective mechanisms before attempting to safely scale Mythos-class models for broader access. Security researchers and professionals are able to access Opus 4.7 through Anthropic's Cyber Verification Program, a controlled channel intended to preserve legitimate vulnerability research use cases while minimizing misuse pathways.

The decision is situated within Anthropic's broader Project Glasswing initiative, which focuses on understanding and managing AI's implications for cybersecurity infrastructure. Mythos Preview's top scores on benchmarks such as CTI-REALM — a standard for measuring vulnerability-finding performance — give concrete shape to the risk calculus Anthropic faced. Notably, alignment evaluations reportedly identified Mythos Preview as the best-aligned model the company had trained to date, a finding that complicates the narrative around safety and capability being straightforwardly in tension. High alignment scores did not translate into a green light for release, suggesting Anthropic views the cybersecurity risk vector as a distinct and more acute concern than general alignment metrics can fully address.

The episode reflects a significant and still-evolving challenge across the frontier AI industry: the dual-use nature of advanced reasoning and analytical capabilities. A model that is extraordinarily good at understanding complex systems — including software architecture and security protocols — is, almost by definition, a model that could be weaponized for offensive cyber operations. Anthropic's withholding of Mythos Preview represents one of the clearest public examples of a lab voluntarily suppressing a completed model for safety reasons rather than releasing it with caveats or restricted access. This stands in contrast to approaches taken by some competitors, and sets a precedent that may influence how regulators, researchers, and other labs think about the threshold at which a model's risk profile warrants non-release rather than conditional deployment.

Anthropic's handling of Mythos Preview also signals that the frontier of AI safety concern is shifting. For years, alignment — ensuring models behave in accordance with human values and intentions — dominated the safety conversation. The Mythos case suggests that near-term, concrete risks from highly capable but well-aligned models are emerging as an equally pressing concern. A model can be cooperative, honest, and well-intentioned and still represent a serious societal hazard if its underlying capabilities can be redirected by malicious actors. This framing, embedded in Project Glasswing and the Mythos decision, may mark a maturation in how leading AI developers publicly articulate the distinction between alignment safety and capability safety — and how they act on that distinction in real deployment decisions.

Read original article →

Detailed Analysis

Don't Miss a Deploy