I asked Claude to investigate its own token burn. The receipts go back six months.

An investigation revealed significant token billing discrepancies in Claude Max plans caused by multiple unacknowledged cache-related bugs persisting for over six months, including a billing-word substitution forcing 10-20× cost rebuilds and improper cache invalidation with --resume/--continue commands. These issues remain unaddressed in official release notes despite sustained community reports, with Anthropic only confirming peak-hour throttling after press contact.

Detailed Analysis

A Reddit user and developer identified a cluster of billing and caching bugs in Anthropic's Claude Code tooling that appear to have caused Max plan subscribers to exhaust token allocations significantly faster than expected. The investigation was initiated by directing a Claude Opus 4.7 agent to audit its own token consumption, which revealed that approximately 127,000 tokens had been billed across eight conversational turns for roughly 25,000 tokens of unique content — a roughly 5:1 ratio that prompted the agent to begin analyzing its own session logs. That self-directed analysis surfaced GitHub issues dating to mid-December 2025, two reverse-engineered defects in the Claude Code binary itself, and a community-authored patch that Anthropic had not yet incorporated into an official release.

The four distinct failure modes identified carry substantial practical consequences. The first involves a billing-related word substitution routine in the Claude Code binary that misfires on common terminology, forcing a full cache rebuild on each conversational turn and inflating costs by an estimated 10 to 20 times. The second affects users of the `--resume` and `--continue` flags, which silently invalidate the prompt cache at session resumption and bill the first turn at full, uncached rates. A third issue creates an unexpected penalty for privacy-conscious users: disabling telemetry reportedly disables the one-hour cache time-to-live window, meaning users who opt out of data collection lose a core cost-control mechanism without any documented warning. A fourth issue involves peak-hour throttling during GMT afternoon hours, which Anthropic allegedly only confirmed after press contact and never published with quantitative detail. Notably, none of these cache-related defects appear in any Anthropic release notes despite reportedly six weeks of acute community reports at the time of writing.

The author developed and released a 50-line monitoring tool, `cc-cache-monitor`, that reads the JSONL log files Claude Code already writes to disk locally and displays per-turn cache hit rates in real time. The tool exposed 128 cache flush events in a single book-writing session, providing a concrete illustration of the bug's frequency. That the underlying diagnostic data existed on users' machines all along — and that Anthropic's own interface did not surface it — is a central point of the writeup. The author offered five interim mitigations: avoiding the GMT 1pm–7pm peak window, refraining from `--resume` and `--continue`, limiting concurrent Claude Code sessions, keeping telemetry enabled despite the privacy trade-off, and running the monitoring tool to observe cache failures as they occur.

The episode sits within a broader pattern of tension between AI infrastructure providers and power users who operate at the edges of token-intensive workflows. As subscription tiers like Anthropic's Max plan promise premium model access, the opacity of token billing mechanics becomes a significant trust and accountability issue. Unlike traditional SaaS pricing, where a unit of consumption is typically legible, large language model billing involves layered caching systems, binary-level preprocessing, and network-dependent behavior — any of which can introduce silent cost amplification. The fact that a sufficiently capable model was directed to audit its own session behavior and surface infrastructure defects underscores both the diagnostic potential of advanced AI agents and the degree to which production tooling around those agents may lag behind the models themselves.

The incident also raises a pointed question about disclosure norms in AI product development. That community-identified bugs with measurable billing impact went unacknowledged in official release notes for an extended period — and that a key behavior such as peak-hour throttling required press contact to confirm — suggests a gap between Anthropic's internal knowledge and its public communication practices. For an organization that has positioned transparency and user trust as core values, the failure to proactively document billing-affecting defects carries reputational weight beyond the immediate technical issue. Whether this reflects deliberate opacity, organizational bandwidth constraints, or the inherent difficulty of attributing cost anomalies in complex distributed systems, the outcome is the same: users paid more than the architecture required, without explanation.

Read original article →

Detailed Analysis

Don't Miss a Deploy