Why I moved away from skills.md and CLAUDE.md for rule management and what I replaced them.

A developer working with Claude Code on a production Magento 2 codebase identified three key limitations in using markdown files for rule management: inefficient token usage from loading entire files, lack of relationships between rules, and unenforced compliance. The developer created Writ, an open source Claude Code plugin that uses a Neo4j knowledge graph to retrieve only relevant rules per task (~1,600 tokens per query versus 83,000 in the full corpus) and bash hooks to enforce compliance structurally at the tool call level. This two-layer approach makes rules interconnected through graph relationships and prevents the agent from circumventing critical steps like testing or approval.

Detailed Analysis

A developer working extensively with Claude Code on a production Magento 2 codebase has publicly documented significant limitations in Claude Code's default rules management architecture and released an open-source plugin called Writ to address them. The post, shared to the Anthropic subreddit, identifies three structural deficiencies in the existing skills.md and CLAUDE.md approach: all-or-nothing context loading that injects entire files regardless of relevance, a flat file structure with no relational awareness between rules or skills, and the fundamentally advisory nature of the system, which allows the agent to ignore instructions without structural consequence. These are not edge cases or misuse scenarios — they are inherent properties of a markdown-based approach to agent configuration at scale.

The Writ plugin responds to these limitations with two distinct architectural layers. The first is a knowledge graph layer backed by Neo4j, where rules, skills, antipatterns, and playbooks exist as typed nodes connected by explicit relationships. A five-stage retrieval pipeline selects only contextually relevant nodes per query, reportedly reducing token injection from approximately 83,000 tokens of available rule content down to roughly 1,600 tokens per query. This is a meaningful efficiency gain: rather than bloating the context window with irrelevant guidance, the system surfaces only what applies to the current task, and related nodes — such as authentication rules connected to a security guideline — are pulled in automatically through graph traversal. The second layer consists of 30 bash hook scripts that intercept tool calls at execution time, creating hard enforcement boundaries rather than soft prompting. The agent cannot write code without an approved plan, cannot declare tests as passing without running static analysis, and cannot self-approve its own work. These constraints are structural, not instructional.

The technical approach Writ takes reflects a broader and increasingly visible tension in agentic AI development: the gap between instructing an agent and constraining one. Markdown-based rule files operate as natural language guidance, which means they are subject to the same probabilistic behavior that governs all LLM outputs. An agent may follow such guidance consistently or inconsistently depending on context, competing pressures within the prompt, or attention limitations as context windows fill. This is manageable for casual use but becomes a liability in production environments where correctness, test coverage, and security compliance are non-negotiable. The developer's choice to route enforcement through bash hooks at the tool call level — rather than through additional prompting — represents an architectural shift from trusting the model to constraining its execution environment.

This development sits within a wider pattern of sophisticated Claude Code users outgrowing the default scaffolding and building custom infrastructure around the agent. The Claude Code plugin and hook ecosystem was designed to enable exactly this kind of extension, and Writ demonstrates both the ceiling of the built-in approach and the floor of what custom tooling can achieve. The knowledge graph retrieval model in particular points toward a direction that mainstream agent frameworks have been slow to adopt: treating agent knowledge as a structured, queryable resource rather than a flat document corpus. Projects like Writ, even when niche in their initial audience, often surface patterns that eventually inform how the underlying platforms evolve — particularly when they address failure modes that affect reliability in professional deployments rather than experimental use.

The post is notably measured in tone, framing the documented limitations not as criticisms of Anthropic but as engineering problems worth solving. That framing matters, because it signals the author's intent to contribute to the ecosystem rather than simply critique it. The public release of Writ under an open-source license, combined with documentation of the specific architectural boundary where Claude Code's progressive disclosure pattern breaks down, positions the project as a reference implementation for production-grade Claude Code deployments. Whether Anthropic incorporates graph-backed retrieval or hook-level enforcement into native Claude Code infrastructure remains to be seen, but the technical case made by this project is substantive enough to warrant attention from developers operating Claude Code in high-stakes environments.

Read original article →

Detailed Analysis

Don't Miss a Deploy