← Reddit

How do you manage the CLAUDE.md (and AGENTS.md) accros a repository that's becoming too huge for an instructions file ?

Reddit · KlausWalz · June 3, 2026
A development team maintains a centralized AGENTS.md file that instructs AI agents on codebase-specific tasks and workflows, which has improved code quality and reduced development time for both experienced and novice developers. The file has grown from approximately 500 words to over 4000 words with essential documentation about packaging, testing, and strategies to prevent common AI mistakes such as hallucination and unnecessary attack surface expansion. To address the file's size, the team is considering fragmenting the instructions to remove language-specific guidelines that become irrelevant when agents work in different parts of the polyglot monorepo.

Detailed Analysis

A development team managing a large monorepo has surfaced a practical scaling challenge in AI-assisted software development: the degradation of usefulness in centralized instruction files like CLAUDE.md and AGENTS.md as those files grow beyond manageable sizes. The team reports that their shared instruction file, originally scoped at around 500 words, has ballooned past 4,000 words of genuinely useful, human-authored guidance covering complex packaging workflows, testing routines, and AI-specific guardrails designed to prevent common failure modes such as hallucinated documentation, stale training-data reliance, and unnecessary security surface expansion. The file has proven measurably effective — enabling junior developers to ship higher-quality code and reducing the cognitive overhead for experienced engineers — but its growth now threatens to undermine the efficiency it was designed to provide.

The core tension here is between comprehensiveness and contextual relevance. In a polyglot monorepo spanning languages like C++ and React, loading the entire instruction corpus for every agent session introduces noise and token waste. The poster's proposed solution — splitting the file by language or domain so that a React-focused agent session does not need to parse C++ coding conventions — reflects a mature instinct toward hierarchical or modular context management. This mirrors established software engineering principles like separation of concerns and lazy loading, now applied to the prompt engineering layer. The challenge is that Claude and similar agents operate most effectively when context is precise and scoped, meaning that a monolithic instruction file, however well-written, increasingly works against the grain of how these systems consume and prioritize information.

This situation reflects a broader maturation curve in enterprise adoption of agentic coding tools. Early adoption typically involves a single, catch-all instruction file that captures tribal knowledge about a codebase. As teams accumulate experience and the files grow denser, the need for structured, hierarchical context management becomes apparent. Anthropic's own documentation for Claude encourages the use of nested CLAUDE.md files placed at relevant directory levels within a repository, allowing agents to load only the context pertinent to the subdirectory they are operating in. This approach effectively transforms codebase-specific knowledge into a tree structure that mirrors the repository's own organization, enabling more targeted and efficient agent behavior without sacrificing institutional knowledge.

The broader implication is that managing AI context files is becoming a distinct engineering discipline within software teams — something analogous to documentation architecture or API design. Teams that invest in structuring these files thoughtfully, rather than treating them as ad hoc accumulations, are likely to see compounding returns as their AI-assisted workflows scale. The fact that this team's instruction file has organically grown to 4,000 words of high-signal content is itself evidence of a healthy human-AI collaboration loop, where humans are encoding real domain knowledge rather than delegating that responsibility entirely to the model. The next frontier is building tooling and conventions that make this encoded knowledge composable, versionable, and context-aware — challenges that the broader developer community is only beginning to systematically address.

Read original article →