Opus 4.7 Review: My Experience — Claude Learning Daily

Opus 4.7 demonstrates improvements in caching, context management, and general capabilities, enabling approximately 50 hours of productive work per week, though it struggles with complex multi-language tasks and tends to avoid difficult prompt components. While this represents improvement from the recent performance slump, it remains below the 80-90 hours per week the user achieved with Opus 4.6 several months earlier. The model proactively seeks clarification but requires explicit redirection to focus on the most challenging aspects of assignments.

Detailed Analysis

Claude Opus 4.7, Anthropic's incremental upgrade to its flagship Opus line, has drawn substantive user feedback that reveals both meaningful improvements and persistent limitations in real-world, high-intensity workflows. The reviewer, operating on Anthropic's MAX 5X Plan, reports that Opus 4.7 successfully addresses two of the more disruptive issues that plagued recent Opus 4.6 usage: caching instability and degraded context management. Productive working hours have rebounded to approximately 50 hours per week — a significant recovery from a recent low of 20–30 hours weekly on 4.6 — and the model demonstrates stronger proactivity by soliciting clarifications before proceeding on ambiguous instructions. These gains align with Anthropic's stated design priorities for 4.7, which emphasize agentic reliability, long-session coherence, and sustained reasoning across multi-step workflows.

However, the reviewer identifies two notable failure modes that temper an otherwise positive assessment. First, task avoidance: Opus 4.7 exhibits a tendency to sidestep the most cognitively demanding portions of a prompt, requiring users to explicitly redirect the model rather than trusting it to self-prioritize difficulty. Second, and more technically significant, performance on complex multi-language codebases — particularly those mixing Python with systems languages like Rust or Go — is described as regressing compared to Opus 4.6. This is a counterintuitive finding given that Anthropic's own benchmarks show Opus 4.7 outperforming its predecessor on SWE-bench Pro (64.3%) and Terminal-Bench (69.4%). The discrepancy likely reflects the gap between controlled benchmark conditions and the messy, cross-language dependency chains that characterize real production environments, where tensor operations and foreign-function interface boundaries create reasoning demands that synthetic evaluations may not fully capture.

The capacity narrative embedded in the review carries its own analytical weight. The reviewer's observation that Opus 4.6, several months prior, could sustain 80–90 hours of productive weekly output — and over 135 hours during promotional 2X limit periods — suggests that Anthropic has been actively tuning compute allocation policies independently of model capability updates. The fact that a newer, ostensibly more capable model (4.7) delivers fewer usable hours than a months-old version of its predecessor at peak conditions points to deliberate throttling or resource rebalancing on Anthropic's infrastructure, rather than any intrinsic model degradation. This is a meaningful concern for power users whose workflows depend on predictable throughput, and it underscores that model versioning and platform resource policy are distinct variables that users must track separately.

In the broader context of AI development, Opus 4.7 occupies an instructive position: it is a production-grade, incrementally refined model that sits below Anthropic's own Claude Mythos Preview on virtually every key benchmark, yet it remains the current generally available flagship. This tiered release structure — where a preview model meaningfully outperforms the shipping product on SWE-bench Verified (93.9% vs. 87.6%) and Terminal-Bench (82.0% vs. 69.4%) — reflects an industry-wide pattern in which frontier labs maintain capability headroom in preview tiers to signal roadmap momentum while managing compute costs and safety validation timelines for production rollouts. Anthropic's addition of production cybersecurity safeguards and improved prompt injection resistance to Opus 4.7 further illustrates how safety hardening now travels as a co-equal engineering priority alongside raw capability gains, rather than as a downstream concern.

For practitioners evaluating Opus 4.7 as an agentic workhorse, the honest picture is one of genuine progress within a constrained envelope. The model is more reliable for long-running automations, CI/CD pipelines, and vision-heavy computer-use tasks than its predecessor, and its improved honesty and proactive clarification behavior reduce iteration overhead in open-ended workflows. Yet users with demanding cross-language or tensor-heavy technical workloads may find Opus 4.6 — under favorable quota conditions — still competitive or superior for those specific use cases. The larger takeaway is that model capability and platform resource availability must both be evaluated when assessing generation-over-generation value, a nuance that benchmark-focused release notes routinely understate.

Read original article →

Detailed Analysis

Don't Miss a Deploy