Opus 4.7 reminds me of Haiku. It's a bit of nostalgia using it and all of the pain.

A Reddit user reported experiencing performance issues with Opus 4.7, describing it as equivalent to the less capable Haiku model and a regression from the previous 4.6 version that handled abstract prompts more effectively. The user expressed frustration at struggling with circular reasoning in the model's responses and indicated reluctance to switch models despite the disappointing experience.

Detailed Analysis

A Reddit user's post on r/Anthropic captures a sentiment that appears to be circulating among Claude power users: that Opus 4.7, released on April 16, 2026, represents a frustrating regression in conversational fluency compared to its predecessor, Opus 4.6. The poster describes hours of circular, unproductive interactions with the new model, likening the experience to using Claude Haiku — Anthropic's lightweight, cost-optimized tier — rather than the company's flagship reasoning model. The comparison is pointed: Haiku is explicitly designed for speed and efficiency over depth, making the analogy a stinging critique of what the user perceives as diminished intuitive responsiveness in Opus 4.7's handling of abstract, open-ended prompts.

The tension embedded in this user experience reflects a genuine architectural tradeoff embedded in Opus 4.7's design. According to Anthropic's own benchmarks, the model delivers a 13% resolution improvement on a 93-task coding benchmark over Opus 4.6, triples scores on Rakuten-SWE-Bench, and achieves 91% on CharXiv Reasoning — metrics that firmly establish it as a more capable system for structured, high-stakes technical work. Its context window expands to 1,000,000 tokens, and its vision capabilities reach 98.5% benchmark accuracy. These are not the numbers of a weaker model. What they suggest, however, is a model increasingly optimized for precision-demanding, well-specified tasks — a profile that may inherently resist the kind of loose, intuitive, "mind-reading" prompting the poster describes as Opus 4.6's strength.

This dynamic points to an underappreciated tension in large language model development: capability improvements on formal benchmarks do not always translate to improved user experience for unstructured, creative, or highly contextual workflows. Opus 4.7's self-verification mechanisms and long-task handling introduce behaviors oriented toward rigor and caution — qualities that can manifest as resistance to ambiguity rather than graceful interpretation of it. Where Opus 4.6 users apparently felt a kind of collaborative inference from the model, Opus 4.7 may demand more explicit scaffolding, rewarding precise prompting while penalizing vagueness. The poster's refusal to switch sessions or models also signals how much trust and workflow momentum users invest in a particular model version, making any perceived degradation in fluency feel especially disruptive.

Broader trends in AI development help contextualize this friction. As frontier models increasingly target agentic and multimodal benchmarks — coding resolution rates, visual document accuracy, software engineering evaluations — the implicit optimization target shifts away from generalist conversational coherence and toward task-completion fidelity in well-defined domains. Anthropic's positioning of Opus 4.7 as an "agentic coding" powerhouse, paired with a cost structure of $25 per million output tokens, signals that the model is being built for enterprise pipelines and developer toolchains rather than casual, exploratory dialogue. The Haiku comparison, however unintended as flattery, is ironic: Haiku's minimalism is by design, while the conversational minimalism users report in Opus 4.7 appears to be a byproduct of maximalism in other directions. Whether Anthropic adjusts the model's behavior in response to this kind of user feedback — as it has historically done through system prompt tuning and minor updates — remains to be seen, but the post represents a meaningful data point in the ongoing negotiation between benchmark performance and lived usability.

Read original article →

Detailed Analysis

Don't Miss a Deploy