How a cavalcade of blunders gave unauthorized users access to Claude Mythos — restricted model accessed by third parties, thanks to knowledge from data breach - Tom's Hardware

Unauthorized third parties accessed Claude Mythos, a restricted artificial intelligence model, through a series of operational blunders. The security incident was facilitated by information obtained from a data breach.

Detailed Analysis

Anthropic's restricted cybersecurity AI model, Claude Mythos Preview, was accessed by unauthorized third parties through a compounding series of security failures involving a contractor employee, weak access controls, and intelligence harvested from an unrelated data breach. Claude Mythos, developed under Anthropic's Project Glasswing initiative, is a highly sensitive tool capable of detecting and exploiting digital vulnerabilities, and as such has been limited to a narrow set of vetted partners including select companies and non-profits. The breach occurred on the same day as the model's limited release announcement — a timing that likely contributed to gaps in access governance. A worker at a third-party contractor firm used knowledge of Anthropic's file system structures and model naming conventions to guess the model's endpoint URL, then shared that access with colleagues through a private Discord group. The group has reportedly kept a low profile, confining their use to mundane tasks like website creation and deliberately avoiding cybersecurity-related prompts to evade detection. They provided proof of access to reporters via screenshots and a live demonstration.

The intelligence that made the breach possible traces back to a separate compromise of Mercor, an AI feedback recruitment firm whose user data was exposed in an earlier hack. That incident had already generated significant collateral damage — including lawsuits and the suspension of contracts with major clients such as Meta — and its downstream consequences are now extending into Anthropic's supply chain. The Mercor breach apparently furnished attackers with enough institutional knowledge about Anthropic's internal conventions to make informed guesses about restricted model URLs, illustrating how breaches at peripheral firms can propagate risk into the ecosystems of far more security-conscious organizations. Anthropic confirmed it is investigating potential unauthorized access through a third-party vendor environment, while asserting that its core systems showed no signs of compromise and that no external impacts had been detected.

The incident exposes a structural vulnerability in how powerful AI models are deployed and access-controlled, particularly when third-party contractors are part of the operational chain. The fact that simply iterating on a model's URL was sufficient to gain access to a cybersecurity-grade AI system — one specifically designed to find and exploit code-based flaws — represents a striking mismatch between the sophistication of the tool and the robustness of the access governance surrounding it. Security experts have pointed to this episode as a textbook example of how human elements, including social engineering, weak vendor controls, and procedural oversights, can circumvent technical safeguards that might otherwise be sound. The supply chain dimension is particularly salient: Anthropic itself was not directly compromised, yet an external actor was able to leverage contractor relationships and third-party breach data to penetrate a high-value restricted environment.

More broadly, the Claude Mythos incident reflects a growing tension in AI development between the desire to deploy increasingly powerful models with specialized capabilities and the enormous difficulty of ensuring that access to those models remains tightly controlled. As AI systems move from general-purpose assistants to purpose-built tools for sensitive domains — cybersecurity, biomedical research, critical infrastructure analysis — the consequences of unauthorized access escalate sharply. The cybersecurity community has long understood that access control is only as strong as its weakest link, and AI deployment pipelines are now discovering those same dynamics at scale. The episode is likely to accelerate calls for more rigorous vendor vetting standards, end-to-end access logging, and mandatory security audits for any third-party partners operating near restricted model environments, particularly as labs like Anthropic continue developing frontier-class tools under heightened public and regulatory scrutiny.

Read original article →

Detailed Analysis

Don't Miss a Deploy