Detailed Analysis
Anthropic publicly acknowledged in April 2026 that a series of overlapping engineering missteps — not intentional model changes — caused a month-long degradation in Claude Code's performance that had drawn significant user backlash beginning in February. The company published a formal postmortem identifying three distinct technical failures concentrated in March and April. On March 4, the default reasoning effort level was quietly reduced from "high" to "medium" in an apparent attempt to cut latency associated with extended thinking, which caused the model to under-allocate cognitive resources on complex engineering tasks, leading to fabricated outputs such as incorrect API versions and git SHAs. Then, on March 26, a cache bug erroneously cleared thinking sessions after every prompt-response cycle — a behavior intended only for sessions idle for more than an hour — making Claude appear forgetful and repetitive to users mid-session. These two changes compounded each other, tripling reasoning reversals, stalling sessions every one to two minutes, and multiplying API calls eight to sixteen times beyond expected scaling levels. Anthropic confirmed fixes were deployed by April 10 for Sonnet 4.6 and Opus 4.6, with the default reasoning effort now elevated to "xhigh" in version 2.1.118.
The user experience during this period was markedly poor. Reports emerging as early as February described a 67% drop in thinking depth, incoherent planning loops with more than twenty reversals per response, ignored instructions, and instances where Claude itself appeared to acknowledge quality failures. These issues were specifically localized to Claude Code, the Claude Agent SDK, and the Claude Coworker tools, and did not affect the core Claude API — a distinction that became significant in Anthropic's diagnosis, as it helped isolate the bugs to agentic tooling layers rather than the underlying model weights. The scale-up of concurrent user sessions during this period likely amplified the bugs' visibility, as compounding errors in multi-step agentic workflows are far more disruptive than equivalent failures in single-turn interactions.
The episode carries broader significance for the AI development industry because it illustrates the acute sensitivity of agentic AI systems to infrastructure-level decisions that would be relatively inconsequential in simpler deployments. Reducing a reasoning budget or misconfiguring a cache-clearing interval are changes that might pass unnoticed in a conversational chatbot context, but in long-horizon coding agents that must maintain coherent state across many sequential steps, such changes can produce catastrophic degradation of perceived intelligence. Anthropic's framing of the phenomenon as a "creeping incompetency" perception — driven by bugs rather than model regression — is a meaningful distinction, but one that users had no way to make themselves in real time, underscoring a transparency gap in how agentic AI systems communicate their operational status to end users.
Anthropic's willingness to publish a detailed postmortem, naming specific dates and specific engineering decisions, represents a relatively mature incident-response posture compared to industry norms, where AI companies have historically been opaque about performance regressions. The company was careful to distinguish these March–April 2026 bugs from separate infrastructure incidents in August–September 2025 involving output corruption and TPU miscompilation, as well as from a contemporaneous but unrelated April 2026 leak of approximately 2,000 source code files attributed to human error in release packaging. By delineating these incidents clearly, Anthropic appears to be managing a more complex reputational challenge: convincing a user base that had experienced multiple distinct failures over several months that each had an identifiable, bounded cause and a confirmed fix.
The broader trend this incident reflects is the growing operational complexity of deploying frontier AI models in agentic, multi-tool environments. As Claude Code, the Agent SDK, and similar products push AI systems toward autonomous, long-running task execution, the surface area for subtle but consequential configuration failures expands dramatically. Latency optimizations, cost-reduction measures, and cache management strategies — all routine concerns in traditional software engineering — become high-stakes variables when the system's outputs depend on sustained reasoning coherence across many interdependent steps. Anthropic's postmortem serves as an early case study for an industry that will increasingly need formalized processes for monitoring, diagnosing, and publicly accounting for performance degradation in agentic AI systems operating at scale.
Read original article →