Introducing FLYWHEEL.md 🌀 — Claude Learning Daily

FLYWHEEL.md is a framework for managing autonomous AI agents that ship software while keeping humans in control at critical decision points. Drawing from Andrej Karpathy's AutoResearch methodology, it establishes a loop where agents handle continuous tasks but pause at designated gates for human intervention and review. The document serves as a unified reference for managing the entire agentic pipeline alongside AGENTS.md and SOUL.md, designed as a single MIT-licensed file that defines when agents proceed and when humans must approve progression.

Detailed Analysis

The rapid proliferation of autonomous coding agents — including Claude Code, Cursor, OpenAI's Codex, and others — has created a new class of software development tooling capable of running continuous, unsupervised development loops through mechanisms like `/loop`, `/goal`, and cron-based scheduling. Against this backdrop, a developer in the Claude AI community has proposed FLYWHEEL.md, a lightweight MIT-licensed specification document designed to give agentic coding pipelines a structured, governable loop with defined human intervention points. The concept draws direct inspiration from Andrej Karpathy's AutoResearch project, which demonstrated how an ML agent could run overnight experiments autonomously, retaining only successful results — FLYWHEEL.md attempts to translate that same iterative feedback structure into the domain of production software shipping.

The core insight driving FLYWHEEL.md is that writing code has never been the primary bottleneck in software development; the harder challenges lie in deployment, production validation, failure analysis, and iterative improvement. By framing these post-coding stages as a "wheel" with discrete, defined stages, the document gives agents a structured understanding of what "done" means beyond mere code generation. Critically, each stage in the loop includes two components: a completion criterion ("done when ___") and a routing decision that determines whether the agent proceeds autonomously or pauses for human review. This gates-based model preserves human oversight at the moments of highest consequence while allowing agents to operate freely within lower-risk segments of the pipeline.

FLYWHEEL.md is positioned as the third element of an emerging canonical set of agent configuration documents, alongside AGENTS.md (which defines task scope and behavior) and SOUL.md (which defines agent identity and values). This convergence toward a small, structured canon of governance documents reflects a broader pattern in agentic AI development, where the community is organically developing conventions for controlling autonomous systems in the absence of formal industry standards. The proposal implicitly acknowledges that different deployment contexts — a CLI tool, a backend model service, a web application — require different loop configurations, meaning FLYWHEEL.md is not a one-size-fits-all checklist but a template for context-specific operational discipline.

The proposal arrives at a moment when the AI development community is grappling seriously with questions of autonomous agent safety and accountability. Claude Code in particular, Anthropic's terminal-based coding agent, has become a focal point for these discussions given its capacity to execute multi-step tasks with minimal interruption. FLYWHEEL.md represents a grassroots, developer-driven response to the governance gap that emerges when agents can ship software around the clock: rather than waiting for top-down frameworks, practitioners are constructing their own lightweight accountability structures. The emphasis on keeping "a human at the gates that matter" echoes Anthropic's own stated philosophy around human oversight as a core safety property during the current period of AI development.

The broader significance of FLYWHEEL.md lies in what its emergence signals about the maturation of agentic software development as a practice. As autonomous coding agents move from novelty to infrastructure, the operational conventions surrounding them — how they are directed, constrained, and reviewed — are becoming as important as the underlying models themselves. The fact that this proposal is MIT-licensed and framed as a single, reviewable file reflects an understanding that governance tooling must be as lightweight and composable as the agents it governs, or practitioners will simply bypass it. Whether FLYWHEEL.md achieves canonical status alongside AGENTS.md remains to be seen, but its logic — structured loops, explicit gates, human accountability at high-stakes junctures — is likely to influence how agentic pipelines are designed as the tooling ecosystem continues to mature.

Read original article →

Detailed Analysis

Don't Miss a Deploy