Detailed Analysis
Anthropic reduced the context cache time-to-live (TTL) for Claude Code from one hour to five minutes around March 2026, a change that has generated notable frustration among developers who rely on long-running, high-context sessions. The shift was first noticed by users receiving in-app prompts like "new task? /clear to save 332.6k tokens" after leaving sessions idle for even a few hours — a message indicating that the previously cached context had expired and would need to be reprocessed at full token cost to continue. The Reddit post in question reflects a broader wave of user complaints: developers who previously left sessions open overnight or stepped away mid-task are now finding that returning to those sessions immediately consumes a significant portion of their usage quota just to restore context state.
The policy history behind this change adds important nuance. Anthropic had actually introduced a one-hour cache for Claude Code around February 1, 2026, before reverting it roughly five weeks later around March 7, 2026. The reversion returned Claude Code to the default five-minute ephemeral TTL that governs standard prompt caching across Anthropic's platform. Anthropic's prompt caching system technically supports two tiers: a five-minute TTL at a 25% token write surcharge, and a one-hour TTL at a 100% write surcharge, with cache reads priced at approximately 10% of the base token rate. The company's internal rationale, articulated by engineer Jarred Sumner, was that the shorter TTL makes Claude Code cheaper overall for the majority of single-shot or short-session use cases where cache reuse is limited. Critics like developer Sean Swanson countered that the policy disproportionately penalizes users engaged in deep, iterative work — precisely the use case Claude Code is marketed toward.
The downstream impact on user experience is significant. For developers working on complex, multi-hour coding tasks, the five-minute TTL effectively eliminates the economic benefit of caching during any interruption longer than a brief pause. A session with hundreds of thousands of tokens in context — not unusual for large codebases — will incur full re-ingestion costs upon resumption, turning what should be a seamless workflow into a financially punishing restart. This dynamic is further complicated by reports that broken caching implementations in third-party tools had already been silently depleting user quotas, meaning some developers were experiencing double exposure: paying for cache writes that weren't being reused, and then paying again when expired caches required reprocessing.
The broader trend this episode reflects is the ongoing tension in the AI industry between infrastructure cost optimization and developer experience. Anthropic's caching architecture — capable of delivering up to 90% cost savings for long, stable prompts through prefix caching — represents a genuine technical achievement in making large-context models economically viable. However, TTL decisions are fundamentally policy choices that redistribute cost between provider and user, and the five-minute default strongly favors Anthropic's infrastructure economics over the workflows of power users. As AI coding tools compete aggressively for developer adoption, such friction points carry outsized reputational risk: developers who feel penalized for how they naturally work are likely candidates for churn toward competitors offering more generous session persistence. The debate over Claude Code's cache TTL is, in miniature, a preview of the larger negotiation the industry will need to have as agentic, long-horizon AI workflows become the norm rather than the exception.
Read original article →