Detailed Analysis
Anthropic's Claude Opus 4.7, released in late April 2026, has drawn a significant wave of criticism from developers over an Acceptable Use Policy (AUP) classifier that is blocking legitimate, professional requests at an unusually high rate. The model's guardrails — implemented with deliberate intensity as part of a broader safety framework — are generating false positives across a wide variety of unrelated use cases. Developers have flooded Anthropic's GitHub repository with complaints documenting concrete failures: Russian-language prompts being refused across psychology, web application, and infrastructure projects; computational structural biology tasks flagged as policy violations; and standard development workflows grinding to a halt. For paying enterprise customers, these are not abstract grievances — they represent an inability to complete billable, legitimate work using a product they are actively paying for.
The root of the overreach lies in a deliberate architectural decision by Anthropic. Opus 4.7 was designed in part as a testing environment for safety mechanisms originally developed for Mythos, a more capable internal model that Anthropic deemed too dangerous to release publicly due to its advanced cybersecurity capabilities. By embedding those heightened guardrails into a commercially deployed model, Anthropic appears to have miscalibrated the sensitivity of the classifier, producing a system that treats a disproportionate range of benign inputs as potential threats. This represents a meaningful policy failure distinct from a capability failure — Opus 4.7 is not broken in what it can do, but rather in how it is deciding what it is permitted to do.
Compounding the content-filtering controversy is a separate but equally disruptive issue involving tokenization. Opus 4.7 ships with a new tokenizer that increases token consumption by a factor of 1.0 to 1.35 relative to prior versions. This has tangible financial and practical consequences: some Claude Pro subscribers report exhausting their monthly usage limits after as few as three queries. The combination of over-restriction and increased cost creates a particularly damaging user experience — the model simultaneously refuses more and charges more, undermining the value proposition for both individual subscribers and enterprise developers who have built workflows around previous Claude versions.
Despite these complaints, Anthropic reports genuine capability improvements in Opus 4.7, including a 14% performance gain on complex multi-step workflows and stronger benchmarks on legal document analysis and coding tasks. This positions the model as a step forward in raw performance even as it stumbles operationally. The backlash spreading across social media and developer forums reflects a recurring tension in frontier AI deployment: safety enhancements and capability improvements are frequently developed in parallel but integrated unevenly, with calibration errors becoming apparent only under the pressures of real-world, diverse use cases that internal testing does not fully anticipate.
The episode is emblematic of a broader challenge facing AI labs as their models grow more capable and their safety mechanisms grow correspondingly more elaborate. The instinct to err on the side of caution is defensible in principle, particularly when a model's underlying architecture carries capabilities deemed too risky for unrestricted public deployment. However, Anthropic's difficulty in tuning Opus 4.7's classifier to distinguish genuine threats from Russian-language psychology texts and structural biology queries illustrates how blunt current AUP enforcement mechanisms remain. As competitors including OpenAI and Google DeepMind continue refining their own safety-capability balancing acts, the commercial and reputational cost of overcorrection — not just undercorrection — is emerging as a serious and measurable risk for AI developers.
Read original article →