Claude builds fast but drifts fast. I wrote a full SOP to fix that — feedback welcome

A developer observed that Claude exhibits performance drift during longer development sessions, starting sharp but gradually filling in gaps and reinterpreting requirements around task 3-4. To address this, they created a Spec-Driven Development SOP modeled on regulated industry change control practices, implementing a formal document stack, five distinct AI roles with explicit boundaries, test-first development, and a session start protocol that has been tested on a personal project for several months.

Detailed Analysis

A software developer working with Claude on extended coding sessions identified a recurring behavioral pattern in which the model begins sessions performing well but progressively drifts from stated requirements as sessions continue — not through outright hallucination, but through what the author describes as "untethered" behavior: gap-filling, requirement reinterpretation, and unsolicited feature addition. To address this, the developer formalized a methodology called Spec-Driven Development (SDD), open-sourced on GitHub, which applies change-control principles borrowed from regulated industries to AI-assisted software development.

The SDD framework centers on a hierarchical document stack — moving from a numbered requirements specification (SPEC.md) through a design document, development plan, tests, and finally code — where each artifact serves as working memory for the AI rather than a polished deliverable. Critically, the system enforces role separation: five named roles (Planner, Test Designer, Developer, Spec Reviewer, Code Reviewer) are each assigned to a single agent per task, with explicit "does not" boundaries preventing role conflation. The methodology also mandates test-first development in which the Test Designer produces failing tests derived directly from functional requirements before any implementation begins, and a Spec Reviewer validates those tests against the requirements before the Developer writes a single line of code. A Session Start Protocol requires every new session to begin with a position report, preventing the context fragmentation that compounds drift across multi-session projects.

The underlying problem the author identifies — context drift in long AI coding sessions — reflects a well-documented limitation of large language models operating over extended interaction windows. As conversational context grows and task complexity compounds, models like Claude can lose fidelity to original constraints, defaulting to plausible-sounding completions rather than precise adherence to stated specifications. The author's framing of this as an "untethered" behavior rather than hallucination is a meaningful distinction: the model is not fabricating facts but rather exercising autonomous judgment about what ought to be built, which can be equally disruptive in a software engineering context where requirement precision is critical.

The SDD approach situates itself within a broader emerging discipline of "prompt engineering for software development," which has produced various frameworks including agentic coding systems, multi-agent orchestration patterns, and structured context management strategies. What distinguishes this particular approach is its explicit borrowing from industrial change-control methodology — sectors like pharmaceuticals, aerospace, and financial services have long required formal traceability between requirements and implementation to manage regulatory risk. Applying that discipline to AI-assisted development acknowledges that LLM-based coding tools, while powerful, introduce a novel class of compliance risk: the model may deliver functional code that does not satisfy the original specification. The author's eight-rule framework, particularly the prohibition on implementation without an approved design document, attempts to close that gap structurally rather than through ad hoc prompting corrections.

The developer's candid acknowledgment that the SOP is likely overkill for some use cases points to an important calibration challenge facing the broader community of practitioners building with AI coding assistants. Lightweight, conversational workflows suit exploratory or solo projects, while the overhead of full traceability documentation becomes justifiable — and perhaps necessary — as project complexity, team size, and business stakes increase. The loan-tracker project described as the proving ground represents a relatively modest scope, yet the author found the formalized structure necessary even there. This suggests that the drift problem manifests earlier in AI-assisted development cycles than practitioners might anticipate, and that the industry has not yet converged on a standard methodology for managing it at scale.

Read original article →

Detailed Analysis

Don't Miss a Deploy