Anthropic quietly nerfed Claude Code's 1-hour cache, and your token budget is paying the price - XDA

Anthropic quietly nerfed Claude Code's 1-hour cache, and your token budget is paying the price XDA [truncated: Google News RSS provides only a snippet, not full article

Detailed Analysis

Anthropic reduced the default prompt cache time-to-live (TTL) for Claude Code from one hour to five minutes around early March 2026, a change that went unannounced and has since generated significant frustration among developers who noticed their token quotas depleting at an accelerated rate. The one-hour cache had only been introduced roughly a month prior, around February 1, 2026, making the quiet rollback particularly jarring for users who had already begun structuring their workflows around the longer caching window. Developer Sean Swanson and Claude Code creator Boris Cherny were among the prominent voices to identify and publicize the change, noting that it was in effect by March 7 without any formal changelog or communication from Anthropic. Anthropic has denied that the change constitutes a systemic cost increase, attributing higher quota consumption instead to user behavior patterns such as large contexts and infrequent cache hits.

The technical mechanics underlying the user complaints are straightforward but have meaningful financial consequences at scale. Writing to the five-minute cache carries a 25% token premium over base input pricing, whereas writing to the one-hour cache costs a full 100% premium — but reading from either cache costs approximately 10% of the base rate, making cache hits dramatically cheaper than full reprocessing. For developers running long-session or agentic workflows, the one-hour cache was highly advantageous: a single expensive write could be amortized across many reads within the session window. Shifting to the five-minute default effectively eliminates that benefit for any workflow with idle periods exceeding a few minutes, forcing repeated full reprocessing of large context prefixes that may include extensive system prompts, tool definitions, and multi-turn message histories. Claude Opus 4.6 and Sonnet 4.6's support for up to one-million-token context windows on paid plans makes this especially punishing, as each cache miss on a near-maximum context represents a substantial compute and billing event.

Anthropic's internal justification, articulated by engineer Jarred Sumner, is that the five-minute default is genuinely cheaper for the most common use case — one-shot or short-interval requests — and that the TTL is set automatically by the client based on detected usage patterns rather than being a static global setting. The company has also indicated it is exploring a 400,000-token default context option that could help manage costs for heavy users, and the one-hour cache remains available to developers who explicitly configure it via the `ttl` field in `cache_control` API calls. However, the absence of any public announcement about the default TTL change, combined with the fact that third-party harnesses and integrations built on Claude Code have no automatic mechanism to adopt the manual workaround, means a broad swath of developers encountered cost inflation without any warning or clear remediation path. Community threads on GitHub and Hacker News confirm quota drain reports clustering around March 2026, suggesting the real-world impact is more widespread than Anthropic's framing acknowledges.

The episode illuminates a broader tension in how AI infrastructure companies communicate and manage changes to underlying platform behavior that directly affect developer economics. Unlike traditional software versioning, where a changelog documents behavioral shifts, cloud AI APIs can alter default configurations in ways that manifest as billing surprises rather than functional breakages — making them easy to miss until costs have already accumulated. The fact that Anthropic's own documentation continues to recommend the one-hour TTL for latency-sensitive or infrequent long-context use cases while simultaneously shipping a five-minute default creates a gap between official guidance and actual platform behavior that erodes developer trust. As agentic AI workflows — precisely the use case that benefits most from longer cache windows — become a primary driver of API consumption, infrastructure decisions around caching defaults will carry increasing weight in the competitive dynamics between Anthropic, OpenAI, and Google, all of whom are racing to position their platforms as the foundation for production AI agents.

Read original article →

Detailed Analysis

Don't Miss a Deploy