What’s happened to Claude usage limits lately? I’m on the Pro plan using 4.6 Thinking, and asking it to refine a single email used 26% of my quota. A few prompts and the limit is nearly gone. It never used to behave like this. Have limits tightened or token usage changed recently?

A Claude Pro user reported that refining a single professional email using the 4.6 Thinking model consumed 26% of their monthly usage quota, representing a dramatic increase in token consumption compared to previous months. The user questioned whether recent changes to token accounting, context handling, hidden reasoning usage, Pro plan limits, or model pricing could explain the accelerated quota depletion experienced during typical writing, strategy, coding, and business tasks.

Detailed Analysis

Claude Pro users have begun reporting dramatically accelerated usage quota consumption, with some experiencing a single prompt consuming as much as 26% of their entire usage allowance when working with the Claude 4.6 Thinking model. The complaints center on what users describe as a qualitative shift in how quickly limits are exhausted during routine tasks — refining a professional email, conducting strategic planning, or completing moderate coding work — activities that previously required only a modest share of the monthly allocation. The poster and others in the thread identify several possible explanations, including changes to token accounting methodology, alterations to how context windows are metered, the computational overhead of hidden chain-of-thought reasoning in extended thinking models, internal repricing, or adjustments to the hard caps of the Pro plan itself.

The core technical explanation likely lies in the nature of extended thinking models themselves. When Claude uses a reasoning or "thinking" mode, the model generates a substantial internal scratchpad of intermediate reasoning steps before producing its final response. These internal tokens — even if not fully visible to the user — are real computational work that must be accounted for in some form of usage metering. As Anthropic has pushed toward more capable reasoning models (analogous to OpenAI's o-series or Google's Gemini Thinking variants), the token footprint per user interaction has grown substantially even for tasks that appear simple on the surface. A user asking to "refine an email" may prompt dozens of internal reasoning steps about tone, structure, audience, and phrasing before a single word of output is produced. This architectural reality fundamentally changes the economics of a flat-rate subscription plan.

The broader context involves a well-documented tension across the AI industry between offering compelling flat-rate subscription tiers and the rapidly escalating compute costs of frontier reasoning models. Anthropic, like its competitors, must balance user acquisition and retention through predictable pricing against the reality that newer, more capable models are orders of magnitude more expensive to run per interaction than their predecessors. The shift users are noticing may reflect Anthropic quietly adjusting how usage is metered for thinking models, applying more conservative limits as these models gain traction, or simply the natural consequence of users migrating toward more computationally intensive models without a corresponding adjustment in their usage habits.

For power users — those relying on Claude Pro for high-volume writing, business strategy, or coding work — the practical implication is a need to rethink prompt strategy when using extended thinking models. Batching requests, being more concise in context provided, switching to standard (non-thinking) model variants for simpler tasks, and reserving reasoning-heavy models for genuinely complex problems are all mitigation strategies the community has begun exploring. Whether Anthropic has made explicit policy changes or whether users are simply encountering the natural ceiling of what a fixed-price subscription can sustain as model capabilities increase, the experience signals a structural pressure point that is unlikely to resolve on its own as AI models continue to grow more capable — and more expensive — over time.

Read original article →

Detailed Analysis

Don't Miss a Deploy