Detailed Analysis
The core limitation driving the argument in this article is Claude's stateless, context-window-bound architecture — a structural constraint that makes it an unreliable long-term collaborator for software development without deliberate compensatory practices. Because Claude retains no memory between sessions, every conversation begins from a blank slate. Architectural decisions, edge-case discussions, dependency reasoning, and tradeoff negotiations accumulated over weeks of collaboration simply cease to exist the moment a user closes the browser tab. The author frames this not as a bug or a failure of Claude specifically, but as a fundamental property of how large language model assistants are currently designed to operate, one that creates a dangerous illusion of continuity when none actually exists.
The proposed remedy — Manual-First Development (MDD) — reframes documentation as a persistence mechanism rather than a bureaucratic overhead. Under this workflow, the human developer is responsible for externalizing and maintaining the context that Claude cannot carry forward independently. Documentation becomes the substrate through which Claude "remembers" prior decisions across sessions, and critically, it keeps the human developer as the authoritative owner of architectural intent. This addresses a subtler failure mode than simple forgetfulness: when Claude confidently refactors a function without awareness of its hidden dependencies, it is not malfunctioning — it is operating exactly as designed, with only the information currently in its context window. MDD attempts to close that gap by making relevant context explicit and portable.
The research context broadens this critique considerably, surfacing a cluster of documented reliability issues that compound the memory problem. Anthropic's own postmortem acknowledged that between August and September 2025, infrastructure bugs caused server misrouting affecting 30% of Claude Code users, output corruption inserting foreign-language tokens into responses, and syntax errors — all of which were difficult to detect because Claude's error-recovery behavior masked the underlying failures from standard benchmarks. Separately, user reports and Hacker News discussions describe Claude hallucinating non-existent functions, providing debugging advice that ignores extensive context provided by the user, and producing inconsistent explanations across sessions. These are not edge-case anecdotes; they represent a pattern of unreliability that has material consequences for developers who treat Claude as a primary rather than auxiliary contributor to production codebases.
The productivity-versus-mastery tension identified in Anthropic's own research adds another dimension to the critique. The company's analysis of Claude.ai usage data found that while AI assistance accelerated task completion by up to 80%, it corresponded with a 17% decline in quiz performance on recent coding concepts — nearly two letter grades — suggesting that speed gains come at the cost of genuine skill development and deep understanding. This finding is particularly pointed in the context of software development, where a developer who relies on Claude to generate code without fully understanding it accumulates invisible technical debt: they cannot effectively audit, debug, or extend what they did not truly comprehend. The MDD framework implicitly acknowledges this risk by insisting on human-authored documentation as a forcing function for understanding.
Taken together, these issues position the "just using Claude" failure mode as emblematic of a broader, industrywide challenge in AI-assisted development: the gap between demonstrated capability in isolated tasks and reliable performance in sustained, stateful, real-world workflows. Claude's context-window limitation is an architectural reality shared across virtually all current frontier LLMs, not a deficiency unique to Anthropic's product. The emergence of structured workflows like MDD reflects a pragmatic recognition by the developer community that the current generation of AI assistants requires significant human scaffolding to function safely in professional contexts. As agentic AI systems grow more capable and autonomous — and as Anthropic and competitors invest heavily in longer context windows, memory modules, and tool-use frameworks — the question of how to maintain human oversight and project coherence without sacrificing the genuine productivity gains AI offers will only become more pressing.
Read original article →