Changing effort level invalidates cache?

A user experimenting with Claude's Messages API discovered that changing the effort level setting between turns in a conversation invalidates cached content, forcing the API to rewrite cache entries instead of reading previously cached system prompts and messages. The behavior was consistent across testing: cache written at one effort level could only be accessed on subsequent turns using the same effort level. The user could not locate documentation confirming whether this cache invalidation pattern is intended behavior or a potential bug.

Detailed Analysis

A developer experimenting with Anthropic's Claude Messages API has surfaced an undocumented behavior: changing the `effort` level parameter between turns in a multi-turn conversation appears to invalidate previously written prompt caches, forcing the system to re-cache content even when the underlying text is identical. In the reported experiment using Claude Sonnet 4.6 with adaptive thinking, a Turn 1 request at `effort: high` successfully wrote two cache breakpoints — one at the system prompt level and one at the user message level. When Turn 2 was issued at `effort: low` with the same system prompt and prior messages, no cache reads were registered, and the content was cached anew. Returning to `effort: high` in Turn 3 restored access to the Turn 1 cache while ignoring the Turn 2 cache entries, suggesting the effort level acts as a cache namespace or key component rather than a transparent execution modifier.

The practical implication of this behavior is significant for developers building cost-optimized, multi-turn applications on top of Claude. Prompt caching is one of Anthropic's primary mechanisms for reducing token costs on repeated or long-context content — cache reads are charged at a fraction of standard input token prices. If effort level changes silently segment cache storage, applications that dynamically adjust effort based on query complexity (a natural optimization strategy) would inadvertently forfeit those savings. Developers would face an invisible trade-off: either lock effort levels across an entire conversation session to preserve cache continuity, or accept redundant caching costs when adapting effort mid-conversation. The fact that this behavior is absent from Anthropic's official documentation compounds the problem, as developers cannot plan around an undocumented constraint.

The underlying technical rationale is plausible even if unexplained. The `effort` parameter, which governs the depth of Claude's extended thinking or "adaptive thinking" process, likely influences internal model execution in ways that make cached key-value states from a different effort configuration incompatible or potentially misleading. If the model's internal attention patterns or intermediate representations differ meaningfully across effort levels, reusing a cache generated under different computational assumptions could produce subtly degraded outputs. Anthropic may have deliberately segmented caches by effort level as a correctness safeguard rather than a cost decision, though absent documentation this remains speculative.

This discovery fits into a broader pattern of complexity emerging around frontier AI APIs as they accumulate layered features — prompt caching, extended thinking, tool use, streaming, and now effort-level controls — each of which can interact with others in non-obvious ways. The gap between what is documented and what developers encounter in production is a recurring friction point across the industry. Anthropic has been iterating rapidly on its API surface, and the `effort` parameter itself is relatively new, which may explain why its interaction with caching has not yet been formally specified. The developer's instinct to raise the issue with Anthropic is appropriate, as this interaction warrants either explicit documentation clarifying that this is intended behavior, or a bug investigation if it represents an unintended limitation.

Read original article →

Detailed Analysis

Don't Miss a Deploy