Claude has been very nerfed recently?

Claude's Opus 4.6 model recently began failing to follow instructions to compress resume tailoring outputs to a single page, with the model neglecting this constraint approximately 70% of the time despite acknowledging the instruction exists. A user paying $100 monthly for the service expressed frustration about the perceived silent degradation in model performance and adherence to explicit formatting requirements.

Detailed Analysis

A Reddit user on the r/Anthropic subreddit reports a noticeable degradation in Claude Opus's ability to follow a persistent project instruction — specifically, condensing a tailored resume to one page by dropping less relevant content. Where the model previously executed this task reliably, the user now observes it failing approximately 70% of the time, producing two-page outputs and acknowledging afterward that it had simply skipped the instruction on the first pass. The user pays $100 per month for Claude access and frames the experience as a silent, undisclosed quality regression — a concern echoed by a wider community of developers and power users who have noticed similar behavioral inconsistencies across Claude's model lineup.

The complaint aligns with a broader pattern of user-reported performance issues that has attracted independent monitoring efforts. The site IsItNerfed.org logged a 46% failure rate for Claude Sonnet 4.5 on coding benchmarks — up from 37% for Sonnet 4 — with elevated failures concentrated around February 2026. Developers on Hacker News and other forums have described "dumb" outputs despite max-effort prompts, duplicate responses, and a sense that Opus now underperforms relative to Sonnet, inverting the expected capability hierarchy. These reports span months and cover multiple model versions, suggesting the issue is not isolated to a single user's workflow or a one-time glitch. Importantly, IsItNerfed also noted that Claude's performance appeared to recover by February 19, 2026, indicating that whatever caused the degradation was at least partially transient.

Anthropic has publicly addressed at least one discrete episode of degraded performance, acknowledging that Claude experienced measurable quality issues between August 29 and September 4, 2025. The company attributed the problem to infrastructure bugs rather than any intentional reduction in model capability driven by cost or load management. This distinction matters: a bug implies an unintended failure that the company is motivated to fix, while a deliberate "nerf" would imply a policy decision to deliver less in exchange for the same subscription price. Anthropic's framing has not fully satisfied critics, however, because infrastructure explanations do not account for persistent, low-level complaints that predate and postdate the acknowledged incident window.

The broader debate exposes a fundamental tension in how AI companies communicate model changes to paying subscribers. Unlike traditional software, where version numbers and changelogs make regressions traceable, large language model deployments are updated continuously and opaquely — users interact with the same product name while the underlying model weights, system prompts, or infrastructure configurations may shift without notice. This creates an asymmetry of information: Anthropic possesses telemetry data that users cannot access, making it nearly impossible for individuals to distinguish subjective drift in expectations from objective capability decline. The absence of reproducible, user-accessible benchmarks means complaints remain anecdotal, even when third-party monitors like IsItNerfed suggest real signal in the noise.

This episode reflects a maturing challenge for the commercial AI industry as subscription prices climb and enterprise reliance deepens. A $100-per-month user reporting that a flagship model now skips explicit, persistent instructions 70% of the time represents a credibility problem for Anthropic regardless of root cause. As Claude competes against OpenAI's GPT-4o, Google's Gemini, and an expanding field of capable models, reliability and instruction-following consistency are not peripheral features — they are core value propositions. The user's frustration underscores a growing expectation that AI providers adopt more transparent communication practices around model updates, performance regressions, and infrastructure incidents, particularly as enterprise and prosumer customers build critical workflows around model behavior they have reason to expect will remain stable.

Read original article →

Detailed Analysis

Don't Miss a Deploy