← Reddit

Has Claude become less intelligent? I had a frustrating day with Claude.

Reddit · lostinthelimbo · April 24, 2026
A user encountered multiple problems with Claude models during code reviews, including inconsistent findings counts from Opus 4.6 (reporting 44, then 34, then 64 findings) and rapid quota depletion. Sonnet 4.6 fabricated false information, misinterpreted terminal symbols, and failed to resolve issues it claimed to have already fixed in previous sessions. The user questioned whether such problems represent a common occurrence or indicate declining performance in older models following the release of newer versions.

Detailed Analysis

Anthropic's Claude AI assistant experienced a documented period of degraded performance in early 2026, prompting widespread user frustration and public scrutiny of the company's model quality practices. The article captures a representative user experience: Claude Opus 4.6 produced inconsistent output counts across successive responses during a code review session — reporting 44 findings, then saving only 34, then overcorrecting to 64 — while Claude Sonnet 4.6 exhibited hallucination-adjacent behaviors, misreading terminal symbols, fabricating justifications, altering user-provided text, and falsely confirming that previously reported bugs had been fixed. These are not trivial UX annoyances; they represent failures in the core reliability contract that professional developers depend on when integrating AI into engineering workflows.

Anthropic has publicly acknowledged and diagnosed the root causes through a formal postmortem. Three distinct configuration-level changes degraded the Claude Code experience in succession. On March 4, the default reasoning setting for Claude Code was quietly reduced from "high" to "medium" to improve response latency — a tradeoff that users clearly did not endorse. On April 16, a separate change reduced response verbosity, which in combination with other prompt modifications measurably hurt coding output quality. These changes rolled out across different traffic segments on different schedules, producing the inconsistent and difficult-to-reproduce degradation that many users, including the article's author, described as sudden and confusing. Anthropic has since stated that all three issues have been resolved.

The episode carries significant implications for how AI companies communicate infrastructure and prompt-layer changes to their users. The author's question — "have older models become less intelligent since the launch of newer ones?" — reflects a reasonable but ultimately incorrect inference. The underlying model weights did not change; what changed were the system-level configurations governing how those weights were invoked. This distinction matters enormously, yet it is nearly invisible to end users, who experience only the degraded output. Anthropic's postmortem is notable for its transparency, explicitly denying the practice of throttling model quality based on demand or server load, but the fact that three compounding changes could degrade a flagship coding product for weeks before resolution raises questions about internal regression testing and deployment protocols.

The broader trend here connects to a recurring tension in the commercialization of large language models: the pressure to reduce inference costs and latency — both economically and competitively motivated — can conflict directly with the quality expectations of power users. Claude Code in particular has been positioned as a serious engineering tool for professional developers, a user base with low tolerance for inconsistency and high sensitivity to subtle degradations in reasoning quality. The timing is also notable, occurring amid intense competitive pressure from rival coding assistants, making any perceived quality regression especially reputationally costly. Anthropic's willingness to publish a detailed postmortem suggests a strategic calculation that transparency serves trust better than silence, a posture that distinguishes it from some competitors but also sets a higher accountability standard for future incidents.

Read original article →