← Reddit

A group of users leaked Anthropic's AI model Mythos by reportedly guessing where it was located

Reddit · fortune · April 23, 2026
Anthropic's Mythos AI model was accessed by an unauthorized group of users shortly after its public announcement, with access reportedly facilitated by a third-party contractor working for Anthropic. The group located the model by using previously leaked information about Anthropic's infrastructure obtained from the AI training startup Mercor. The unauthorized users have maintained continuous access to the model since its release, though they have not reportedly used it for cyberattacks.

Detailed Analysis

Anthropic's unreleased Claude Mythos model became public knowledge in late March 2026 not through the Discord-based guessing scheme described in some reporting, but through a straightforward configuration error in the company's own content management system. Security researchers Roy Paz of LayerX Security and Alexandre Pauwels of the University of Cambridge were the first to discover the exposed data store, which made nearly 3,000 unpublished internal assets accessible — including a draft blog post that detailed the model's capabilities in significant depth. Fortune reviewed the leaked documents, notified Anthropic on March 26, 2026, and published its report the following day, after which Anthropic moved to restrict access and attributed the incident to "human error." The account involving a contractor-connected Discord group guessing a model's server location based on information from AI training startup Mercor does not align with the documented chain of events, which involved no unauthorized credential use or social engineering — only an improperly secured public-facing endpoint.

The substance of what leaked is arguably more consequential than the leak mechanism itself. Anthropic's internal draft described Mythos as a "step change" in capability, positioned above the existing Opus line, with markedly advanced reasoning, coding, and cybersecurity-specific skills. Most strikingly, the draft characterized Mythos as capable of autonomously identifying and exploiting software vulnerabilities and patching its own code — a class of capability that moves AI systems meaningfully closer to acting as independent offensive cyber actors. Anthropic's own language in the draft warned of "unprecedented cybersecurity risks" if the model were misused at scale, including the potential to enable coordinated, large-scale cyberattacks. In response, the company indicated it was pursuing a highly controlled rollout, initially limiting access to enterprise security teams and vetted early access customers rather than releasing the model broadly.

The incident sits at the intersection of two distinct but related concerns that have grown more prominent as frontier AI capabilities expand: operational security at AI labs, and the dual-use nature of increasingly powerful models. On the operational side, the exposure of nearly 3,000 unpublished assets through a CMS misconfiguration — rather than through a sophisticated breach — underscores that the most significant near-term security vulnerabilities at leading AI companies may be mundane infrastructure failures rather than nation-state intrusions. The fact that Apple-focused leaker M1Astra was able to archive a copy of the draft post on X before Anthropic could fully contain the exposure illustrates how quickly such lapses can become irreversible in the modern information environment.

More broadly, Mythos represents the clearest public signal yet that Anthropic is deliberately developing AI systems with offensive cybersecurity applications in mind, even as it frames their deployment in defensive terms. This mirrors a wider pattern across the frontier AI industry, where labs are racing to produce models capable of acting as autonomous agents in complex technical domains — including security research — while simultaneously grappling with governance frameworks for capabilities that could be trivially weaponized. Anthropic's cautious rollout strategy, beginning with enterprise security teams, reflects an implicit acknowledgment that the gap between "beneficial security research tool" and "cyberattack enabler" is narrow enough to require gatekeeping at the distribution layer. Whether that gatekeeping holds under commercial pressure remains an open question, and the Mythos leak — regardless of its mundane cause — has already ensured that the model's existence and capabilities are now part of the public record.

Read original article →