Why AI model performance is worsening

An author argues that major AI companies are reducing model capabilities to cut costs as they prepare for IPOs and pursue profitability. The article notes that proving deliberate performance reduction is difficult because no independent mechanisms regularly verify whether model capabilities diminish after release.

Detailed Analysis

A Reddit post in the r/Anthropic community has articulated a theory gaining traction among AI users: that companies like Anthropic and OpenAI are deliberately degrading their model performance in pursuit of profitability ahead of anticipated IPO filings, a process the author frames using Cory Doctorow's concept of "enshittification." The core argument holds that as user and enterprise lock-in increases, financial incentives shift away from product improvement and toward cost reduction, with cheaper, less capable model versions served to paying customers who have limited ability to switch platforms. The post attracted significant discussion, with the author acknowledging uncertainty and inviting critique of the argument's logic.

The post raises several legitimate observations alongside claims that deserve scrutiny. The general complaint about perceived model degradation is not new — similar concerns were widely documented in 2023 and 2024 around OpenAI's GPT-4, with at least one Stanford University study examining behavioral shifts in model outputs over time after deployment. The observation that independent, ongoing capability auditing of production AI models is largely absent is also accurate and represents a genuine gap in AI governance infrastructure. However, the specific investment figures cited — $30 billion from NVIDIA to OpenAI and $10 billion into Anthropic — appear to misrepresent the scale and nature of NVIDIA's actual financial relationships with these companies, undermining the post's analytical credibility on that point.

The post's central causal argument contains a notable logical tension. If Anthropic and OpenAI are deliberately degrading performance to cut costs ahead of IPOs, this strategy could easily backfire: deteriorating product quality is precisely the kind of reputational and churn risk that public market investors scrutinize. Enterprise customers and developers, the highest-value subscribers, are also the most capable of benchmarking model performance internally and migrating to competitors. The enshittification thesis is more persuasive when applied to platforms with true network-effect lock-in — social media, e-commerce marketplaces — than to AI inference, where the underlying model weights are increasingly commoditized and open-source alternatives from Meta, Mistral, and others provide genuine competitive pressure.

That said, the structural financial pressures the post identifies are real. Both Anthropic and OpenAI have operated at substantial losses, and the transition toward profitability inherently involves cost optimization decisions that could include serving smaller, more efficient model variants under existing product names without explicit disclosure. The absence of regulatory mandates requiring model versioning transparency or performance disclosure to consumers means companies retain wide discretion over what users actually receive versus what is advertised. This accountability gap is particularly acute for subscribers who lack the technical expertise to run systematic evaluations of output quality over time.

The broader trend the post gestures at — the maturation of the generative AI market from a growth-at-all-costs phase into one governed by unit economics — is a legitimate and consequential development. As AI companies approach public markets, pressures to demonstrate sustainable margins will intensify, and the mechanisms for doing so almost certainly include inference cost reduction through model distillation, quantization, and selective capability deployment. Whether this constitutes deliberate "enshittification" or ordinary product evolution under financial constraint is partly a semantic question, but the underlying dynamic — that commercial incentives and user-facing quality can diverge as markets consolidate — is well-supported by the history of technology platforms and warrants serious, ongoing scrutiny from researchers, regulators, and consumers alike.

Read original article →

Detailed Analysis

Don't Miss a Deploy