in case you don't know why Claude models keep getting worse after the 4.7 release: Anthropic lets OpenClaw be used again. the model changes mainly to make it able handle this traffics with lowest costs at the expense of code users.

Claude models degraded in quality after the 4.7 release due to Anthropic allowing OpenClaw to be used again. The model modifications were designed to handle increased traffic at the lowest costs, prioritizing cost efficiency over performance for code users.

Detailed Analysis

Claude Opus 4.7, released by Anthropic in 2026, has generated substantial backlash from developers and power users who report measurable performance regressions compared to its predecessor, Opus 4.6. The most striking documented decline involves long-context retrieval, where benchmark scores on the MRCR v2 evaluation plummeted from 78.3% at one million tokens under Opus 4.6 to just 32.2% under Opus 4.7 — a collapse of more than 50 percentage points. At 256,000 tokens, retrieval accuracy similarly fell from 91.9% to 59.2%. Compounding these regressions, a new tokenizer introduced with 4.7 consumes up to 35% more tokens for equivalent input, effectively raising costs for users even though Anthropic left nominal pricing unchanged. Developer communities have also flagged that Claude Code now incorrectly flags routine operations like file I/O and network calls as potential malware, and at least one analysis of nearly 7,000 sessions found a 67% drop in perceived reasoning depth since February.

The Reddit post in question attributes these regressions to Anthropic re-enabling a model or service referred to as "OpenClaw," claiming the architecture was altered to handle increased traffic volume at minimum cost, deliberately sacrificing quality for developer workloads. This specific explanation, however, is not supported by any verifiable technical disclosure from Anthropic or independent third-party analysis. No documentation confirms the existence or reactivation of an entity called "OpenClaw," nor has Anthropic acknowledged any traffic-cost-optimization tradeoff targeting code users specifically. The claim circulates as a community theory rather than an established fact, and its spread reflects the degree of distrust that has accumulated between Anthropic and its developer base following several perceived post-release capability degradations to prior models.

Anthropic's own framing of Opus 4.7 emphasizes genuine improvements in specific domains: vision processing received a significant upgrade, image resolution tripling in supported contexts; document reasoning errors declined by 21% relative to Opus 4.6; and instruction-following in agentic workflows was cited as meaningfully stronger. The company also replaced the explicit "Extended Thinking" toggle with an "Adaptive Reasoning" system that auto-calibrates compute expenditure per task, defaulting to an "xhigh" setting within Claude Code. This architectural shift appears to be a central source of confusion — some users reporting degradation may be encountering the model at lower adaptive-reasoning settings than they previously used, while others at high or xhigh settings report comparable or improved results. The behavioral surface of the model has changed in ways that are sensitive to prompt wording, making consistent evaluation difficult across user populations.

The broader significance of this controversy lies in the structural tension between optimizing AI models for general consumer use cases and maintaining the depth of capability that professional developers depend on for production-grade tasks. Unlike casual conversational use, software development workflows rely on long-context fidelity, consistent reasoning chains, and conservative false-positive rates in code analysis — precisely the dimensions where Opus 4.7's documented regressions are most damaging. Anthropic positioned the 4.7 release partly as a safety proving ground ahead of its more powerful forthcoming "Mythos" model, which was reportedly withheld from release due to cybersecurity risk assessments. This framing did little to mollify developers whose automated pipelines, bug-fix workflows, and agent chains broke in measurable ways following the update.

The episode illustrates a recurring dynamic in frontier AI development: as companies scale models to serve massive and varied user bases, the aggregate optimization pressures of cost, safety, and general capability can erode the specialized performance that early adopters — often the developers who built critical infrastructure on the platform — depend upon. Whether Anthropic's architectural choices for Opus 4.7 reflect deliberate cost-traffic tradeoffs, safety-motivated behavioral constraints, unintended regression artifacts, or some combination of all three remains unclear from publicly available evidence. What is clear is that the gap between Anthropic's communicated improvements and the lived experience of a significant portion of its technical user base has widened, making transparent, granular capability disclosures an increasingly urgent expectation from the developer community.

Read original article →

Detailed Analysis

Don't Miss a Deploy