Detailed Analysis
A public conversation unfolded on social media involving Claude Code users, Anthropic engineer Boris Cherny (@bcherny), and developers @kieranklaassen and @trq212, centered on unexpected and opaque token consumption within Claude's Pro and Max subscription plans. The thread was precipitated by a user discovering that a single prompt had consumed 18% of their Claude Max plan's allocation, and another reporting total lifetime token usage of over 934 million tokens. A key technical finding emerged: context compaction — a mechanism Claude uses to summarize prior conversation history before it exceeds context limits — appears to have been silently transmitting large compacted contexts as part of individual requests, dramatically inflating per-request token usage in ways users could neither anticipate nor observe in real time.
In response to the confusion, developer @kieranklaassen shared a diagnostic script that parses local Claude project logs stored in `~/.claude/projects/` and breaks down token consumption by project, session, and subagent. This tool filled a significant observability gap: users running multi-agent or agentic workflows — particularly those employing cron jobs, background orchestration, or heavy MCP (Model Context Protocol) tool integrations — had no native way to attribute token costs to specific spawned agents or automated tasks. One user noted that MCP plugins with dozens of tools can silently consume tens of thousands of tokens per session, compounding the problem for power users who connect multiple integrations without per-session selectivity.
The thread also surfaced tension between user frustration and Anthropic's support posture. Several users alleged that rate limits had become materially more aggressive over recent weeks, with some claiming that tasks they previously completed without issue now consistently hit hourly caps by early in the week. Boris Cherny engaged directly, offering to conduct live diagnostic sessions with affected users and signaling that the team was evaluating how to make token usage visibility self-serve. His transparency was noted and appreciated by some in the thread, while others directed pointed criticism at Anthropic for what they characterized as undisclosed changes to rate limit behavior.
The exchange reflects a broader and escalating challenge in the agentic AI era: as Claude is increasingly used not just for single-turn conversations but for autonomous, multi-step, multi-agent workflows, the cost and resource models inherited from simpler chat interfaces become structurally inadequate. Token budgets that were sufficient for conversational use are rapidly exhausted by orchestration patterns involving tool calls, context compaction cycles, and subagent delegation — none of which are currently surfaced with granularity in Anthropic's native billing interface. The community-built diagnostic script being discussed as a candidate for `npx` distribution is a direct artifact of this tooling gap.
The incident also illustrates the emerging norm of AI infrastructure companies being held to SRE-level accountability standards by developer communities that depend on their platforms for production workloads. Users invoking Splunk-style observability interfaces and requesting usage dashboards that sync across devices and billing cycles are not hobbyists but professional operators treating Claude as critical infrastructure. Anthropic's response — a commitment to making diagnostics self-serve and active engagement from a named engineer — represents the kind of developer-relations accountability that will likely become a baseline expectation as agentic AI platforms mature and their economic stakes rise.
Read original article →