I built a plugin that replaces Claude Code's skill file approach with a knowledge graph and bash hook enforcement

A developer created Writ, a Claude Code plugin replacing markdown skill files with a Neo4j knowledge graph and bash hook enforcement to address context scaling and enforcement limitations. A five-stage RAG pipeline retrieves only relevant rules per query (approximately 1,600 tokens instead of the full 83,000-token corpus), while 30 bash hook scripts prevent code generation without approved plans and passing static analysis checks. The approach enforces requirements at the tool call level rather than relying solely on prompt instructions.

Detailed Analysis

A developer working extensively with Claude Code for Magento 2 development has published an open-source plugin called Writ that addresses two structural limitations in Anthropic's current skill file architecture: context window inefficiency and the absence of hard enforcement mechanisms. The plugin, released on GitHub under the handle infinri, replaces the default markdown-based rules and skills system with a Neo4j knowledge graph and a suite of bash hook scripts wired directly into Claude Code's tool call lifecycle. The author frames the project not as a criticism of Claude Code but as a practical response to scaling problems that emerge once a developer's rule corpus grows large enough to degrade model performance through context saturation.

The retrieval component is technically the more novel of the two systems. Rather than loading all rules, antipatterns, and playbooks into context on every turn — a practice that becomes increasingly costly and noisy as the corpus grows — Writ stores those artifacts as typed graph nodes and runs a five-stage RAG pipeline against them. The result is a dramatic compression ratio: roughly 83,000 tokens of encoded knowledge reduced to approximately 1,600 tokens of contextually relevant material per query, delivered at sub-millisecond latency. This approach reflects a well-understood problem in long-context LLM deployment: raw context length does not translate linearly into useful attention. The model's effective recall degrades with noise, and structured retrieval is a more principled solution than simply expanding the window.

The enforcement layer is arguably the more consequential innovation from a safety and workflow reliability perspective. Writ's 30 bash scripts attach to Claude Code's PreToolUse, PostToolUse, and SessionEnd hook points, creating hard gates at the tool call level rather than relying on natural language instructions. In work mode, the system structurally prevents Claude from writing code before a plan and tests are approved, blocks the model from asserting that tests pass without first running a battery of static analyzers — PHPStan, ESLint, ruff, cargo check, go vet, and custom analyzers for injection, authentication, cryptographic misuse, and N+1 query patterns — and makes self-approval architecturally impossible. This distinction between descriptive and prescriptive controls matters significantly: prompt instructions can be ignored, rationalized around, or simply forgotten across a long session, while hook-level blocks cannot.

The project surfaces a meaningful tension in how agentic AI systems are currently being productized. Anthropic's Agent Skills framework, and the broader category of markdown-based context injection it represents, optimizes for human readability and ease of authorship but does not account for the compounding costs and diminishing returns that emerge at scale. The progressive disclosure pattern the author references — revealing rules and skills to the model incrementally as tasks demand them — is a reasonable design heuristic in theory, but the implementation through flat files lacks the semantic structure needed to make retrieval precise. A graph-backed alternative, as Writ demonstrates, allows relationships between concepts to participate in retrieval decisions, not just keyword overlap or file organization.

More broadly, Writ reflects a growing pattern in the developer community around frontier AI coding tools: practitioners who use these systems intensively are discovering that the default scaffolding is optimized for onboarding and demonstration rather than production-grade reliability. The gap between what a model is instructed to do and what it can be structurally compelled to do is becoming a first-class engineering problem, particularly as agentic workflows move from single-turn completions to multi-step autonomous execution. Projects like Writ are early evidence that the infrastructure layer around AI coding assistants — retrieval architecture, enforcement primitives, audit hooks — may be as important to real-world utility as the underlying model capabilities themselves.

Read original article →

Detailed Analysis

Don't Miss a Deploy