Anthropic Engineering Postmortem: Claude's 60-Minute Memory Bug

Detailed Analysis

Anthropic publicly disclosed a significant memory bug affecting its Claude models in a detailed engineering postmortem, revealing that a change shipped on March 26, 2026 introduced a flaw that caused Claude's thinking history to be erased on every conversational turn rather than just once during idle sessions. The original intent of the change was straightforward: clear older reasoning data from sessions that had been idle for more than an hour to reduce latency. Instead, a flawed implementation meant the clearing mechanism triggered continuously throughout an affected session. The bug was present in the Sonnet 4.6 and Opus 4.6 models and was not resolved until April 10, 2026, with the release of v2.1.101—leaving a window of roughly two weeks during which users experienced degraded model behavior. The fix was catalogued as version 2.1.101 of Claude Code.

The practical consequences for end users were disorienting and difficult to immediately diagnose. When a user sent a follow-up message while Claude was mid-tool-use, a new turn would begin under the broken memory-clearing flag, stripping away not just historical reasoning but the current turn's active reasoning as well. Claude would then continue executing tasks without retaining why it had made prior decisions, producing observable symptoms including forgetfulness, repetitive actions, and anomalous tool selections. A secondary and economically significant side effect was that the bug caused cache misses at the API layer, which is the likely explanation for a separate wave of user complaints that Claude Pro usage limits were being consumed far faster than expected—a financially impactful symptom for subscribers relying on those credits for production workflows.

The postmortem is notable for its candid accounting of why the bug proved so difficult to detect and reproduce. Two unrelated concurrent changes obscured the problem: an internal-only server-side experiment involving message queuing, and a separate modification to how extended thinking was displayed that suppressed the bug's visibility in most command-line interface sessions. The defect existed precisely at the intersection of three distinct systems—Claude Code's context management, the Anthropic API, and the extended thinking feature—meaning no single team's test suite covered the full interaction surface. Despite passing multiple rounds of human code review, automated code review, unit tests, end-to-end tests, and automated verification, the bug persisted undetected for over a week after deployment.

The incident is emblematic of a class of infrastructure failures increasingly common as AI systems grow more architecturally complex. Extended thinking, agentic tool use, and multi-turn session management each introduce stateful dependencies that interact in non-obvious ways, and the failure mode here—a system that continues operating confidently while silently lacking critical context—is particularly insidious because it degrades output quality rather than causing an outright failure. Unlike a crash or a timeout, a model that forgets its own reasoning mid-task may not trigger any automated alert. This distinguishes AI system reliability engineering from traditional software reliability, where error states are typically more legible.

Anthropic's decision to publish a detailed postmortem reflects a broader norm emerging among frontier AI labs toward operational transparency, particularly as their systems are embedded in professional and enterprise workflows where reliability failures carry real costs. The disclosure also signals the growing importance of session-state management as a first-class engineering concern. As Claude and similar models are deployed in longer-horizon agentic contexts—executing multi-step tasks across tool calls, file systems, and APIs—the integrity of working memory across turns becomes as critical as model capability itself. The April 2026 incident will likely serve as a reference case for how subtle context-management bugs can propagate into user-facing quality degradation at scale.

Read original article →

Detailed Analysis

Don't Miss a Deploy