I can't get with 4.7 — Claude Learning Daily

A user reported experiencing significant quality degradation with Claude 4.7 compared to earlier versions, citing laziness, hallucinations, fake citations, overly defensive reasoning, and response padding as primary issues. The model's inflexibility when working through problems and apparent unwillingness to complete requested tasks led the user to consider switching platforms after being satisfied with prior versions. The complaint highlighted frustration that product quality appeared to have declined despite the user's continued loyalty.

Detailed Analysis

A user on the r/Anthropic subreddit has published a pointed critique of Claude's 4.7 model, articulating a sharp decline in quality relative to earlier versions — specifically Opus 4.5 and 4.6 — across several concrete dimensions of performance. The complaints center on four distinct failure modes: what the user characterizes as "laziness" or truncated outputs that fall short of requested scope, an inflexibility in reasoning where the model refuses to revise or abandon initial positions during collaborative intellectual exploration, response padding designed to simulate depth rather than deliver it, and — most critically — recurring hallucinations paired with fabricated citations that persist even after the user explicitly identifies and calls out the problem. The user frames this last failure as the precise reason they previously abandoned ChatGPT, making its reemergence in Claude feel like a regression rather than a lateral tradeoff.

The emotional tenor of the post carries analytical weight beyond mere frustration. The user was, by their own account, a highly satisfied customer through two prior version iterations, which distinguishes this from baseline dissatisfaction or unrealistic expectations. The concern is not that Claude has always been inadequate, but that a specific and recent version update introduced or amplified behaviors that compromise the model's usefulness for exploratory, iterative intellectual work. The reference to "padded responses" is particularly telling — this suggests the model may be optimizing for surface signals of quality (length, hedging language, apparent thoroughness) rather than substantive alignment with user intent, a known failure mode associated with certain reinforcement learning from human feedback dynamics.

The post connects directly to a broader and well-documented tension in large language model development: the difficulty of maintaining consistent behavioral quality across model versions. Anthropic, like other frontier AI labs, continuously updates and refines its models, and these updates can introduce regressions in specific capability domains even as they improve performance on benchmarks. The user's complaint about citation hallucination is particularly resonant — this failure mode has proven stubbornly persistent across the industry, and its recurrence after user correction suggests the model is not updating its internal confidence calibration within a session in the way a user reasonably expects. The observation that calling out hallucinations fails to prevent their recurrence points to a gap between the model's apparent acknowledgment of error and its actual behavioral adjustment.

The broader significance of this type of user feedback lies in what it reveals about the relationship between model capability and user trust over time. A user who has invested in a platform, developed workflows around it, and expressed strong satisfaction is a qualitatively different data point than a new or ambivalent user. When a committed power user begins evaluating exit options — particularly back toward a previously abandoned competitor — it signals that perceived quality degradation has crossed a practical threshold. The user's explicit reluctance to "change platforms every couple months" reflects the real switching costs involved in AI tool adoption, and underscores that model updates are not experienced in isolation but against the accumulated baseline of prior interactions. For Anthropic, the risk is not merely losing one user, but eroding the compounding trust that distinguishes a habitual tool from an occasionally useful one.

Read original article →

Detailed Analysis

Don't Miss a Deploy