← Google News

Claude Opus 4.7: Anthropic's AI Finds Decade-Old OS Flaws - UC Today

Google News · April 20, 2026

Detailed Analysis

Anthropic's Claude Opus 4.7, released on April 16, 2026, made headlines by autonomously identifying long-standing security vulnerabilities across every major operating system — some of which had gone undetected for nearly three decades. Rather than withholding the findings, Anthropic proactively shared the discovered flaws with twelve major technology companies, signaling both the model's considerable capability and the company's commitment to responsible disclosure. The vulnerability discovery was conducted as part of Anthropic's internal security testing regime ahead of releasing its more advanced "Mythos-class" models, suggesting the company is deliberately stress-testing its AI in high-stakes domains before broader deployment. To manage associated risks, Opus 4.7 ships with automated safeguards that block high-risk cybersecurity uses, alongside a dedicated Cyber Verification Programme designed to grant access to legitimate security professionals.

On the technical performance front, Claude Opus 4.7 delivers notable gains in several benchmark categories. The model achieves 87.6% on SWE-bench Verified and demonstrates top-tier precision in code review tasks, including the resolution of race conditions that eluded prior model generations. Anthropic attributes much of this performance uplift to an "adaptive reasoning" architecture, wherein the model dynamically determines how long to deliberate before producing a response, rather than operating under fixed compute budgets. The introduction of a new "xhigh" effort tier and changes to how reasoning traces are handled by default further reflect Anthropic's push toward giving developers more granular control over cost-performance tradeoffs.

Despite these advances, Claude Opus 4.7 has generated substantial user backlash, particularly among developers. A new tokenizer has reportedly increased costs by up to 35% on code-heavy prompts, and long-context retrieval performance has degraded sharply — dropping from 78% to 32% on the MRCR benchmark — representing a significant regression for applications dependent on large context windows. Users have also flagged "combative" response tendencies, erroneous safety flags on benign inputs, and bugs surfacing in production pipelines. API-level breaking changes, including errors triggered by previously valid parameters such as `thinking.budget_tokens` and `temperature`, have further strained developer workflows, with some teams reverting to Opus 4.6 while Anthropic works to address the issues.

The broader significance of Opus 4.7's OS vulnerability discovery lies in what it signals about AI's emerging role in offensive and defensive cybersecurity. The fact that an AI model independently surfaced flaws that human researchers missed for decades — across the full spectrum of major operating systems — illustrates the scale at which large language models can now conduct systematic code analysis. This development accelerates a pressing policy and industry question: how should AI capabilities in vulnerability research be governed, shared, and commercialized? Anthropic's choice to disclose findings to affected companies rather than exploit or publish them outright positions the company as a responsible actor, but it also underscores the dual-use tension inherent in deploying highly capable AI in security contexts.

The mixed reception of Opus 4.7 reflects a recurring pattern in frontier AI releases, where capability improvements arrive bundled with regressions, pricing shifts, and infrastructure disruptions that create friction for existing users. Anthropic's acknowledgment that further tuning is underway suggests the model is being shipped in an iterative state, a practice that has become normalized across the industry but which carries compounding risk when the model in question is capable of identifying zero-day vulnerabilities. The episode underscores the growing tension between rapid capability deployment and the operational stability that enterprise and developer communities increasingly require, a tension that will only intensify as AI systems are integrated deeper into critical infrastructure workflows.

Read original article →