AMD AI director says Claude Code is becoming dumber and lazier since update

Detailed Analysis

AMD's Senior Director of AI, Stella Laurenzo, published a formal GitHub complaint on April 4, 2026, asserting that Claude Code "cannot be trusted to perform complex engineering tasks" following what her team identified as a significant and measurable performance regression beginning in early March 2026. The criticism was not anecdotal — Laurenzo and her team backed their claims with a rigorous empirical analysis of 6,852 Claude Code sessions comprising 234,760 tool calls. Their data revealed a sharp decline in the average number of code reads Claude performed before making changes, dropping from 6.6 to just 2 by the end of March. Simultaneously, behavioral anomalies described as "stop-hook violations" — indicating the model was prematurely abandoning tasks — along with permission-seeking behavior and reasoning shortcuts, surged from zero prior to March 8 to an average of 10 daily incidents by month's end. Claude also began rewriting entire files rather than making targeted, surgical edits, a pattern inconsistent with professional engineering workflows.

Laurenzo attributed the degradation to version 2.1.69 of Claude Code, which introduced thinking content redaction, hypothesizing that this change reduced the depth of the model's internal reasoning process. This distinction is technically significant: thinking tokens — the intermediate computational steps a model takes before generating a final response — are understood to be closely correlated with performance on complex, multi-step tasks such as software engineering. By redacting or limiting thinking content, Anthropic may have inadvertently constrained the cognitive resources available to the model for tasks that demand sustained, layered reasoning. Laurenzo proposed a concrete remedy: a tiered subscription model that would allow enterprise users with demanding workflows to access higher thinking token allocations, noting that the current uniform model fails to differentiate between users who need 200 thinking tokens per response and those requiring 20,000.

The complaint resonated well beyond AMD's internal engineering team. Reddit communities and independent Claude Code users reported nearly identical experiences, with some professionals stating they could no longer recommend the tool to clients in good conscience. Comparative criticism emerged as well, with users noting that OpenAI's Codex was outperforming Claude Code on complex coding benchmarks — a pointed competitive rebuke given that Claude Code had previously been regarded as one of the strongest AI coding assistants available. The breadth of the community response suggests Laurenzo's data reflects a systemic issue rather than an edge case particular to AMD's hardware-focused engineering environment.

The episode sits within a broader and increasingly prominent tension in the AI industry between cost optimization and performance consistency. As AI providers scale their infrastructure and manage computational costs, decisions around context windows, reasoning depth, and token allocation carry real downstream consequences for enterprise users. What appears on the backend as a reasonable engineering trade-off — limiting thinking token usage to reduce inference costs — can manifest on the user side as an inexplicable and frustrating capability regression. This dynamic is compounded by a lack of transparency: users cannot readily inspect thinking token depth, making it difficult to diagnose whether a model's apparent laziness is the result of a deliberate configuration change, a software bug, or a model update.

The AMD incident highlights a maturing set of expectations among enterprise AI consumers. Technical leaders at major companies are no longer accepting degraded performance passively; they are instrumenting their usage at scale, publishing empirical findings, and demanding structural accountability from AI providers. Laurenzo's call for tiered thinking token access and greater transparency signals a broader market pressure on Anthropic — and the AI industry at large — to treat enterprise-grade reliability and configurability as first-class product features rather than afterthoughts. As AI coding tools become deeply embedded in professional engineering workflows, the cost of undisclosed regressions is no longer merely inconvenient; it is a reputational and competitive liability for the providers themselves.

Read original article →

Detailed Analysis

Don't Miss a Deploy