Detailed Analysis
AMD's Senior Director of AI, Stella Laurenzo, issued a pointed public rebuke of Anthropic's Claude Code tool in early April 2026, declaring it untrustworthy for complex engineering work after months of documented performance degradation. Posting under the GitHub handle "StellarAccident," Laurenzo backed her criticism with substantial empirical data drawn from her team's analysis of over 6,800 coding sessions, 234,760 tool calls, and 17,871 thinking blocks. The data revealed two significant behavioral regressions: a sharp rise in what she termed "stop-hook violations"—instances where Claude prematurely abandons tasks, avoids taking responsibility, or seeks unnecessary permissions—climbing from zero in early March 2026 to approximately 10 per day shortly thereafter; and a troubling shift from a research-first to an edit-first coding approach, with code reads prior to edits dropping from an average of 6.6 to just 2 by late March. The practical consequences were lower-quality code, weaker adherence to project conventions, and degraded reliability during extended sessions. Notably, every senior engineer on Laurenzo's team independently confirmed the same problems, prompting AMD to migrate to an unnamed alternative provider while publicly calling on Anthropic to address the issues.
The regression Laurenzo identified appears to coincide with Anthropic's deployment of a feature called "thinking redaction" (redact-thinking-2026-02-12), rolled out in Claude Code versions 2.1.69 and 2.1.20 in early March 2026. Users and engineers broadly suspect this mechanism truncates the model's actual reasoning process, even though Anthropic has maintained that it only hides reasoning outputs from the user interface without reducing the underlying thinking. Anthropic engineer Boris Cherny entered the public conversation to defend the company's position, reiterating that thinking redaction does not diminish actual model reasoning, and announced that the team is testing elevated effort levels—such as an effort=85 default, with higher settings for Teams and Enterprise tiers—designed to enable deeper thinking at the cost of additional tokens and latency. Cherny also praised Laurenzo's analysis as a substantive contribution to the discussion, signaling that the critique was taken seriously internally. However, the gap between Anthropic's technical assurances and the reproducible performance drop documented by a major semiconductor company's AI team represents a credibility challenge that public praise alone cannot fully resolve.
Laurenzo's critique extended beyond the immediate regression to raise structural transparency concerns about how AI coding tools expose their internal reasoning to enterprise customers. She called for Anthropic to surface thinking token counts on a per-request basis and proposed a "max thinking tier" subscription model capable of allocating 20,000 thinking tokens for complex engineering workflows, contrasting that with basic tasks that might require only 200 tokens. This framing reflects a growing expectation among sophisticated enterprise AI consumers that they should have granular visibility into and control over model behavior, rather than accepting opaque defaults. The demand is particularly significant coming from AMD, a company both competing in and deeply invested in the AI hardware ecosystem, lending the critique institutional weight beyond individual user frustration.
The episode situates itself within a broader pattern of AI coding assistant regression complaints that emerged across the industry through late 2025 and into 2026. Reddit communities and developer forums logged similar concerns about degraded performance in tools from multiple vendors following updates ostensibly related to safety filters, token optimization, or behavioral tuning. For Anthropic specifically, the timing compounds a difficult period: the Claude Code source code experienced a public leak, and the platform faced separately reported token usage surges that strained reliability. Taken together, these incidents suggest that as AI coding tools mature from novelty to critical infrastructure in engineering workflows, the tolerance for unexplained behavioral shifts has sharply decreased. Enterprise teams now treat AI coding assistants with the same expectation of change-management discipline—documented behavioral specifications, regression testing, and transparent rollout notes—that they apply to any production software dependency.
The AMD incident underscores a fundamental tension in deploying frontier AI models as developer tooling: the iterative, often opaque nature of model updates conflicts directly with the stability and predictability enterprise engineering teams require. Anthropic's response—publicly engaging with the critique, offering technical clarification, and announcing experimental mitigations—demonstrates a degree of responsiveness, but the fact that a company of AMD's stature felt compelled to abandon the tool while waiting for a fix illustrates the reputational and competitive stakes involved. As AI coding assistants become embedded in mission-critical development pipelines at hardware and semiconductor companies working on next-generation AI infrastructure, the feedback loop between tool reliability and enterprise trust is becoming a defining battleground for the leading AI labs.
Read original article →