Why did you take my token but give no response?

A Claude Pro subscriber reported experiencing token depletion without receiving responses when hitting usage limits, with the system instructing them to wait five hours before retrying. After the waiting period, the user encountered continued service unavailability and expressed frustration that Anthropic does not clearly communicate its usage limits and rate-limiting policies upfront.

Detailed Analysis

A Claude Pro subscriber's public complaint highlights a recurring friction point between Anthropic's stated capacity expansions and the on-the-ground experience of paying users. The user describes a specific sequence of failures: a multi-minute Claude Code analysis task consumed the session's token allocation entirely before returning any output, followed by a five-hour lockout. Upon returning at the prescribed time, the user encountered a second failure state in which neither Claude Sonnet 4.6 nor Claude Opus 4.7 would respond at all, forcing a complete restart of the conversation thread. The user notes the compounding cost structure of this failure — tokens were spent on the first attempt without producing a result, and resuming the task would require spending a comparable allocation again with no guarantee of a different outcome.

The complaint surfaces two distinct but related product problems. The first is a transparency deficit: Anthropic does not publish explicit, user-facing documentation of how session token limits are calculated, when they reset, and how cost is allocated when a response fails to complete. The second is a system-level failure in which a timeout or context-window overflow discards both the computation and the output, rather than returning a partial result or preserving the session state. For a non-technical user — one who describes having no programming background and relying on the Mac app or browser interface — these mechanics are entirely opaque, making the experience feel arbitrary and punitive rather than the result of a technical constraint.

The frustration carries additional weight given the backdrop the user explicitly invokes: Anthropic has publicly promoted infrastructure investments and increased compute availability as selling points for its subscription tiers. When marketing communications emphasize expanded capacity while users simultaneously experience silent failures and unresponsive sessions, the credibility gap becomes a reputational issue as much as a product one. This is especially true for Pro subscribers, who represent a self-selected, high-engagement cohort willing to pay for premium access and who are therefore most likely to push against usage boundaries.

The broader trend this complaint reflects is the persistent tension in consumer-facing AI products between the stochastic, compute-intensive nature of large language model inference and the reliability expectations that subscription pricing naturally creates. Unlike traditional SaaS products where a paid tier reliably unlocks a defined feature set, LLM services involve variable response times, context window limits, rate throttling, and infrastructure contention that can produce wildly inconsistent experiences. Anthropic, OpenAI, and Google have all faced versions of this criticism, and the industry has yet to converge on a standard for communicating these constraints clearly to non-technical users. The common workaround — vague references to "usage limits" without quantitative disclosure — tends to erode user trust precisely because it lacks the specificity needed for users to plan around restrictions.

The case also illustrates an underappreciated accessibility dimension of AI rate limiting. The user notes that English is their fourth language and that they are a complete programming novice, which compounds the difficulty of navigating opaque system behaviors and community forums for resolution. As AI tools market themselves to increasingly global and non-technical audiences, the design of failure states and limit communications becomes a meaningful equity issue — one where clear, multilingual, plain-language explanations of constraints are not merely a convenience feature but a baseline expectation for paid products.

Read original article →

Detailed Analysis

Don't Miss a Deploy