Extreme hallucinations output from Opus today (1st May) and yesterday

A Claude Opus 4.7 user reported severe hallucinations and task failures beginning May 1st, with the model ignoring direct instructions such as renaming a specific class and removing CSS styles while performing unrelated modifications instead. The user's token consumption also accelerated dramatically, with a five-hour token allowance exhausted in under ten minutes on a single prompt, despite previously relying on Claude Mem to preserve tokens.

Detailed Analysis

A Claude MAX Plan user reported significant performance degradation from the Opus 4.7 model on April 30 and May 1, 2026, describing a pattern of persistent hallucinations, task non-compliance, and anomalous token consumption that diverged sharply from prior baseline behavior. The user's complaints center on two distinct failure categories: instruction-following failures, in which the model ignored explicit directives and substituted unrelated or contrary actions (renaming the wrong class, adding CSS overrides with `!important` rather than removing styles as directed), and a dramatic token usage spike in which a single prompt exhausted a full five-hour token allocation in under ten minutes — an event the user describes as unprecedented in their experience with the Claude Mem memory management tool.

The combination of symptoms reported is notable because they suggest potential issues at multiple levels simultaneously. Hallucination and instruction-following degradation could stem from model-side changes, context window management failures, or prompt interference — but the token consumption anomaly is a separate, more concrete engineering signal. Claude Mem is designed to compress and externalize memory in order to reduce active context load; if the model began ignoring that compression layer entirely, it would process significantly more tokens per turn, which is consistent with the reported drain. This could indicate a change in how the model handles tool-use outputs or memory injection, rather than a pure reasoning degradation.

User reports of sudden, unexplained model behavior changes are a recurring pattern in the Claude community, and they typically cluster around one of several root causes: silent model updates or fine-tune rollouts, infrastructure-level load balancing that routes requests to different model versions, or rate-limit policy changes that alter how MAX Plan allocations are metered. The user's suspicion that Anthropic may have "limited the model" reflects a broader frustration in the power-user community that Anthropic does not always communicate capability or policy changes transparently, leaving users to diagnose degradations empirically through their own workflows.

This incident sits within a wider industry tension around frontier model reliability and the gap between marketed capabilities and day-to-day production consistency. As Anthropic positions Opus as its most capable and premium tier — commanding the highest token costs and targeting sophisticated, agentic use cases like the one described — reliability expectations are correspondingly elevated. Agentic workflows involving code modification, file management, and multi-step UI changes are particularly intolerant of instruction-following failures, since a single misdirected action can cascade into compounding errors across a codebase. The reported behavior, if representative of a broader rollout issue rather than an isolated edge case, would represent a meaningful reliability regression precisely in the model tier where Anthropic can least afford one.

The post also illustrates the structural information asymmetry that characterizes the current AI product landscape. Users on premium plans operate sophisticated, high-dependency workflows built atop model behavior that can shift without notice, and they have no reliable mechanism to distinguish between their own configuration errors, prompt-layer issues, and genuine model regressions. The absence of a status page, changelogs, or regression acknowledgment from Anthropic forces users into adversarial interpretations — the user's invocation of "shady" behavior reflects not paranoia but a rational response to opacity. As AI systems become deeper infrastructure for professional workflows, the expectation for transparency around model changes will only intensify.

Read original article →

Detailed Analysis

Don't Miss a Deploy