Pink Elephant In The Room - Anthropic Doesn't Want You Using Max Effort

I've seen countless posts of users hitting limits after the usage changes, which does say something, ty. However, I haven't seen many comments on what the effort level was, which we all know everyone sets to Max. Anthropic needs to be clear on how effort

Detailed Analysis

A Reddit post titled "Pink Elephant In The Room - Anthropic Doesn't Want You Using Max Effort" reflects growing user frustration with token consumption patterns in Claude Code following Anthropic's April 2026 introduction of effort levels in Claude Opus 4.6. The original poster, a self-described heavy user on the Max ($5) plan, reports that identical workloads now consume approximately 30% of weekly usage per day, compared to roughly 10% per day under prior conditions — a tripling of effective consumption. The post raises several specific technical grievances: cache invalidation triggered by effort-level changes wastes tokens before any new work begins; resuming sessions after closing a console or IDE incurs unexplained additional usage; and running Claude at lower effort levels produces degraded output quality that generates its own compounding inefficiencies through error-correction loops. The poster concludes with a concern that Anthropic may be deliberately rationing high-quality inference for enterprise customers, leaving consumer-tier users in an unworkable middle ground.

The core premise of the post — that Anthropic is discouraging max effort use — is not supported by Anthropic's public positioning. The company's documentation for Claude Opus 4.6 explicitly promotes all four effort tiers (low, medium, high, and max), encourages developers to experiment across them, and highlights max effort as the appropriate setting for deeply complex agentic and reasoning tasks, citing benchmark results such as 62.7% on MCP Atlas and top performance on Terminal-Bench 2.0. The API is designed to expose effort as a developer-controlled parameter, not a hidden or gatekept feature. The disconnect between the poster's perception and Anthropic's stated intent is itself revealing: when usage limits are encountered quickly enough and consistently enough, users can reasonably interpret the economics of the system as a soft prohibition, even in the absence of an explicit one. Anthropic did acknowledge in early April 2026 that Claude Code users were hitting limits "way faster than expected," framing it as a known issue rather than a design goal, but that acknowledgment did little to resolve the day-to-day frustration for high-volume users.

The technical complaints about caching and session resumption point to a real and underappreciated infrastructure challenge in agentic AI tooling. When an effort-level change triggers a full cache miss, the token cost of that context reload can dwarf the incremental cost of continuing the original prompt — a counterintuitive outcome that undermines the whole purpose of tiered effort as a cost-management tool. Similarly, the described behavior around session resumption, where reconnecting to an existing session triggers redundant context recalculation, suggests that the system's token accounting does not yet cleanly distinguish between productive computation and overhead. These are not trivial engineering problems: long-context agentic sessions are inherently stateful, and managing that state efficiently across interruptions, reconnections, and effort-level transitions requires coordination across inference, caching, and billing layers that may not yet be tightly integrated.

The broader significance of this friction lies in what it reveals about the current maturity of consumer-tier agentic AI products. Anthropic, like its peers, is navigating a transition from chatbot-style interactions — where token costs are bounded and predictable — to open-ended agentic workflows where a single session can involve hundreds of tool calls, large documentation corpora, and iterative debugging loops with highly variable compute demands. The effort-level system is a reasonable architectural response to this challenge, offering users a lever to manage quality-cost tradeoffs explicitly. But the lever is only useful if users can predict its consequences, and the poster's complaint is fundamentally about opacity: without visibility into how effort levels map to token consumption in real workloads, users cannot make informed decisions. The practical tips offered in the post — scripting repetitive tasks, using CLI tools to bypass session context, building reusable skills — are essentially workarounds for the absence of that visibility, and they hint at what better tooling in this space would eventually need to provide.

Read original article →

Detailed Analysis

Don't Miss a Deploy