Detailed Analysis
A technical advisory analyzing Claude Code's behavior under Anthropic's Opus 4.7 model has documented a striking pattern of cache read accumulation: approximately 16 billion cached tokens consumed across just eight agentic coding sessions. The analysis, surfaced via a GitHub advisory repository focused on hidden problems in Claude Code, draws its conclusions from forensic examination of the JSONL session files that Claude Code maintains locally on disk to track conversation state, tool calls, and message history. These JSONL logs — one per session — serve as the ground truth for token accounting and reveal not just what was said, but what was read from cache at each turn, making them a powerful instrument for diagnosing runaway token consumption.
The scale of the figure demands context. Claude Opus 4.7 introduced a 1 million token context window alongside a new tokenizer that can consume between 1x and 1.35x more tokens than prior model generations depending on content type. In long-horizon agentic workflows — precisely the use case Opus 4.7 was optimized for — Claude Code re-reads accumulated session context at each turn, and cache reads can compound rapidly as sessions grow. Eight sessions averaging 2 billion cache reads each suggests that either sessions are being allowed to accumulate extraordinary context depths without truncation, or that the model's improved file-system memory features (scratchpads, persistent notes across turns) are generating repeated large-context retrievals that were not anticipated at this scale by developers integrating the API.
The forensic methodology itself is noteworthy. By parsing JSONL session logs — which record every message, tool invocation, and API response including token usage metadata — the analyst reconstructed the full cache read history post-hoc. This approach mirrors an incident documented on Anthropic's own GitHub issue tracker, in which 160 session JSONL files totaling roughly 1GB and 60,000 messages were accidentally deleted, leaving only the sessions-index.json intact. That incident underscored how much operational and diagnostic value lives inside those flat files, and the Opus 4.7 advisory leverages the same data architecture to expose a cost and performance problem invisible to users relying solely on aggregate billing dashboards.
The broader implication for teams deploying Claude Code at scale is significant. Cache reads, while priced at a fraction of input token costs in Anthropic's API pricing structure, are not free, and at 16 billion tokens the numbers become material even at discounted rates. More critically, the pattern indicates that session context is not being bounded or summarized aggressively enough in long agentic runs, which also affects latency and model coherence as context windows fill. The Opus 4.7 tokenizer change — introduced partly to improve multilingual and code-heavy content handling — appears to interact with this dynamic in ways that amplify token accumulation rather than mitigate it.
This advisory fits within a wider pattern of the developer community conducting independent forensic analysis of AI coding tools to surface behaviors that official documentation does not anticipate. As models like Opus 4.7 push toward longer autonomous operation windows, the gap between designed behavior and emergent resource consumption in real workflows is becoming a meaningful engineering concern. Anthropic has positioned Claude Code's JSONL session architecture as a feature enabling continuity and memory, but the same architecture, combined with a larger context window and a new tokenizer, is creating an observability and cost-management challenge that the ecosystem is only beginning to instrument and address.
Read original article →