Claude Opus 4.7 seems to use way more tokens than expected

Claude Opus 4.7 is consuming tokens at significantly higher rates than initially expected, with real-world measurements showing approximately 1.4-1.47× more token usage compared to Anthropic's stated 1.0-1.35× estimate. This increased tokenization leads to faster context budget depletion, quicker token accumulation during longer sessions, and higher effective costs per workflow, likely representing a capability tradeoff designed to improve the model's handling of code, markdown, and structured text.

Detailed Analysis

Claude Opus 4.7's updated tokenizer is producing measurably higher token consumption than Anthropic's official documentation suggests, creating meaningful cost and workflow implications for developers building on the model. While Anthropic's migration materials cite an expected 1.0–1.35× increase in token count relative to previous models, real-world testing by developers using production-representative inputs — including git logs, stack traces, project instructions, and extended coding prompts — consistently yields ratios closer to 1.4–1.47×. In one concrete benchmark, Opus 4.7's system prompt alone consumed 1.46× more tokens than its predecessor, Opus 4.6. The discrepancy between documented and observed token inflation suggests that Anthropic's estimate may reflect average or controlled conditions rather than the heavier, structured inputs characteristic of developer and agentic workflows.

The practical consequences of this token inflation are threefold and compound quickly at scale. Context budgets are exhausted faster, long-running sessions accumulate token counts more rapidly, and the effective cost per workflow rises even though list prices remain nominally unchanged at $5 per million input tokens and $25 per million output tokens — the same pricing as Opus 4.6. Because output tokens are priced at five times the input rate, Opus 4.7's documented tendency to generate more thorough responses at higher effort levels (particularly for coding and agentic tasks) amplifies cost exposure beyond what input tokenization alone would suggest. A request that cost $0.10 on Opus 4.6 could realistically cost $0.13–$0.135 or more on 4.7, a 10–35% effective price increase despite the unchanged rate card. For organizations running high-volume or long-context workloads, that delta compounds into significant budget variance.

The tokenizer change is not incidental — it reflects a deliberate engineering tradeoff. New tokenizers are typically designed to improve a model's handling of structured and semi-structured text, including code, markdown, JSON, and technical documentation, by representing these formats more granularly or semantically appropriately. This often improves model performance on developer-centric tasks at the cost of higher token counts for the same raw input. In Opus 4.7's case, the capability uplift appears to be the explicit goal, with the token inflation being an accepted side effect rather than an oversight. Anthropic has acknowledged the difference and updated its token counting API to reflect 4.7's behavior, recommending that teams measure against their own traffic distributions, update `max_tokens` configurations, and leverage features like prompt caching — which offers a 90% token cost discount — to manage the effective cost increase.

The broader pattern here reflects a recurring tension in frontier model development: as models grow more capable, particularly for agentic and long-horizon coding tasks, the infrastructure costs of running them tend to rise in ways that are not always immediately legible from headline pricing. Claude Opus 4.7's situation illustrates that pricing parity with a predecessor model does not guarantee cost parity in practice, especially when tokenizer architecture, output verbosity, and image resolution handling all shift simultaneously. For developers and enterprises evaluating whether Opus 4.7's performance gains justify its higher effective cost, the answer will depend heavily on workload composition — teams running short, structured queries may see modest increases, while those operating deep agentic pipelines or processing large images could face substantially higher bills. The backlash documented by some users, including hitting usage limits faster on consumer-tier Claude Pro plans, underscores that the token inflation is perceptible not just in API billing but in perceived product behavior as well.

Read original article →

Detailed Analysis

Don't Miss a Deploy