Is it me or anyone got mad at dropping Cache TTL to 5s?

A developer expressed frustration with Anthropic's reduction of Cache TTL to 5 seconds and developed a workaround that injects new messages every 4 minutes 50 seconds using Claude's hooks and tmux. The workaround code was shared on GitHub for other users encountering the same caching limitation.

Detailed Analysis

Anthropic's unannounced reduction of the default prompt cache time-to-live (TTL) from one hour to five minutes, implemented on March 6, 2026, has ignited widespread frustration among developers building on the Claude API. The change, which went live without a changelog entry, blog post, or deprecation notice, effectively invalidated caching strategies that thousands of developers had built around the assumption of hour-long cache persistence. The consequences were immediate and measurable: Claude Code sub-agents in particular saw near-zero cache hit rates, dramatically faster token quota depletion, and reported cost increases ranging from 15% to 53% depending on workload patterns. A Reddit user known as NomaDamas captured the sentiment of many affected developers by describing the shift as a "horrible move" and publishing a GitHub repository — "Saving-Private-Token" — offering a workaround that injects a message every four minutes and fifty seconds using Claude's hooks and tmux to artificially keep caches alive before they expire.

The technical and financial implications of the change extend beyond simple inconvenience. Prompt caching in Claude's API works by storing processed context on Anthropic's servers, allowing subsequent calls that reuse that context to pay significantly reduced token costs. Under the new default, the write cost for a five-minute cache is 25% more than the base input price, while the one-hour option costs 100% more — a pricing structure that made longer TTLs economically favorable for any workflow involving sessions exceeding a few minutes. Developers running agentic pipelines, long coding sessions, or multi-turn document analysis found themselves paying repeatedly to re-cache the same large context windows. Compounding the problem, disabling telemetry in Claude Code was found to force the five-minute TTL even when beta headers were explicitly set, and sub-agent TTL behavior was reported as unstable in the weeks following the change, oscillating unpredictably between both durations.

Anthropic's response to the controversy has been limited and defensive in character. Jarred Sumner, associated with the Claude Code team, defended the change by arguing that five-minute TTLs are cheaper for one-shot calls — the most common use case in Claude Code's main agent — and noted that the main agent itself retained the one-hour TTL. However, critics countered that the silent nature of the change, combined with the lack of any global configuration setting allowing developers to override the default, left no reasonable path for affected users to adapt proactively. Community-sourced workarounds have since proliferated, including explicitly setting `"ttl": 3600` in cache control parameters, adding the `anthropic-beta: prompt-caching-2024-07-31` header, logging cache hits and misses to detect degradation, and implementing proactive cache refresh for sessions exceeding 45 minutes. The "Saving-Private-Token" repository represents the more blunt end of this spectrum — a timing-based hack that treats a pricing infrastructure decision as an adversarial constraint to route around.

The episode reflects a broader and growing tension in the AI developer ecosystem around the opacity of infrastructure changes by major model providers. Unlike traditional software dependencies, where breaking changes trigger semantic versioning and release notes, AI API providers have historically treated operational parameters like caching behavior, rate limits, and model routing as internal concerns subject to change without notice. As developers increasingly build production-grade, cost-sensitive applications on top of these APIs — particularly agentic systems where context windows are large and session durations are long — the expectation gap between provider flexibility and developer stability is widening. The TTL controversy joins a pattern of similar friction points, including undocumented model capability regressions and shifting rate limit policies, that have prompted calls within the developer community for more formal change management practices from Anthropic and its peers. Whether Anthropic responds with explicit TTL configuration options, restored defaults, or clearer communication norms remains to be seen, but the volume and specificity of community pushback suggest the issue has reached a threshold that warrants a substantive policy response.

Read original article →

Detailed Analysis

Don't Miss a Deploy