← Google News

The Claude-lash is here: Opus 4.7 is burning through tokens — and some people's patience - Business Insider

Google News · April 17, 2026
The Claude-lash is here: Opus 4.7 is burning through tokens — and some people's patience Business Insider [truncated: Google News RSS provides only a snippet, not full article

Detailed Analysis

Anthropic's Claude Opus 4.7, released on April 16, 2026, has generated significant user frustration despite the company maintaining identical per-token pricing to its predecessor, Opus 4.6. The backlash centers on a structural change that decouples sticker price from actual cost: a new tokenizer that maps the same input text to between 1.0 and 1.35 times more tokens than before, effectively raising per-request expenditures by up to 35% without any change to the published rate of $5 per million input tokens and $25 per million output tokens. Compounding this, the model is architecturally inclined to "think more" — generating substantially higher output token counts, particularly in agentic and multi-turn workflows where it was explicitly designed to operate. For developers and enterprises running high-volume, automated pipelines, the real-world cost differential between Opus 4.6 and Opus 4.7 can be material even when billing dashboards show the same nominal price per token.

The irony is that Opus 4.7 represents a genuine technical leap in exactly the use cases where cost sensitivity is highest. The model posts top benchmark results, passes three TBench tasks that prior Claude models failed, improves throughput from approximately 72 to 81 tokens per second, and is purpose-built for long-running asynchronous agent tasks — coding across large codebases, multi-step enterprise workflows, document drafting, and data analysis. Its 1 million token context window comes with no long-context price premium, and it is available across major cloud infrastructure including Amazon Bedrock, Google Vertex AI, and Microsoft Foundry. Anthropic has also published mitigation guidance, pointing users toward effort parameters, conciseness prompting, prompt caching (which can reduce costs by up to 90%), and batch processing (50% discount). These levers exist, but they require active engineering effort and workflow changes that not all teams are positioned to absorb immediately upon migration.

The broader dynamic at play reflects a tension that is becoming increasingly common as frontier AI models grow more capable and more deeply embedded in production systems. Token pricing has long served as the primary unit of cost transparency in the API economy, but as models become more "agentic" — reasoning through problems, self-correcting, and operating across extended contexts — the relationship between tokens consumed and value delivered grows harder to predict from the outside. Anthropic's decision to optimize Opus 4.7 for high-effort agentic performance rather than token economy is a deliberate architectural choice, not an oversight, but it transfers the burden of cost management from the model provider to the developer. That shift is what is driving the "Claude-lash": users who expected a drop-in upgrade at equivalent cost are instead discovering they need to re-instrument their applications.

This episode sits within a wider industry trend of AI labs prioritizing capability benchmarks and agentic performance over raw inference efficiency as the competitive frontier moves from chat assistants to autonomous workflows. OpenAI, Google DeepMind, and Anthropic are all racing to build models that can operate reliably over long task horizons, and that ambition inherently rewards models that reason extensively rather than respond concisely. The tokenomics of agentic AI — where a single user-initiated task might spawn dozens of internal reasoning steps and tool calls — remain poorly understood by many enterprise buyers who evaluated these models in simpler chat contexts. Opus 4.7's reception suggests that as the industry pivots toward agents-as-a-product, transparent cost modeling and migration tooling will need to become as central to model releases as benchmark scores.

Read original article →