Claude Code Source Deep Dive - Part VII: Multi-Agent System

Part VII of the Claude Code Source Deep Dive series details context compression mechanisms, including a compact compression prompt that enforces text-only responses and automatically summarizes conversations through chronological analysis of technical details and error tracking. A memory extraction agent manages persistent memories across user, feedback, project, and reference types using a parallel two-turn execution strategy, while a session memory system tracks work via a 10-section template covering current state, task specifications, relevant files, workflows, errors, codebase documentation, learnings, and worklogs. Auto-compaction triggers activate at 13,000 token buffers to ensure context remains manageable during extended development sessions.

Detailed Analysis

Claude Code's internal architecture, as revealed through source code analysis in this ongoing deep-dive series, employs a sophisticated multi-layered memory and context management system designed to maintain continuity across long-running development sessions. The compaction subsystem, located in `src/services/compact/prompt.ts`, triggers automatically when a session's token count approaches predefined thresholds — specifically at 20,000 tokens remaining as a warning and 13,000 tokens as the hard compaction trigger. When compaction fires, Claude is instructed through a structured prompt to produce a plain-text summary organized around an `<analysis>` block and a `<summary>` block, capturing chronological events, technical decisions, file edits, error resolution, and pending tasks. A circuit breaker mechanism caps consecutive compaction failures at three, preventing runaway retry loops. Upon resuming, the system injects a recovery message that references the transcript path and instructs the model to continue without meta-commentary or acknowledgment of the interruption.

The memory extraction agent, found in `src/services/extractMemories/prompts.ts`, operates as a subagent with constrained tool access — read-only Bash and file writes restricted to a dedicated memory directory — and is explicitly budgeted to complete its work in two turns: one for parallel reads, one for parallel writes. The agent categorizes persistent memories into four types (user, feedback, project, and reference) and stores each in its own frontmatter-formatted file with a pointer maintained in a central `MEMORY.md` index. Notably, the system enforces strict boundaries on what should be committed to memory: code patterns, architectural conventions, debugging recipes, and anything already covered by `CLAUDE.md` files are explicitly excluded on the grounds that they are derivable from authoritative sources like the codebase itself or git history. This design reflects a deliberate philosophy of avoiding redundant state and trusting the code as the ground truth.

The session memory system, governed by `src/services/SessionMemory/prompts.ts`, structures working context around ten fixed sections — including current state, task specification, errors and corrections, workflow commands, and a step-by-step worklog — each capped at 2,000 tokens with a total budget of 12,000 tokens. Section headers and their italic descriptions are treated as immutable scaffolding, with updates permitted only to the content beneath. The system is instructed to write dense, concrete entries that include file paths, function names, and exact error messages rather than abstract summaries, and updates are batched through parallel Edit tool calls. This granular structure allows a resumed session to reconstruct not just what was accomplished but the procedural and error context surrounding it, which is critical for agentic coding tasks where sequential tool use and iteration are the norm.

Taken together, these systems represent a serious engineering response to one of the fundamental constraints of large language model deployment: finite context windows in long-horizon agentic tasks. Rather than treating compaction as a last resort or a crude truncation, Anthropic has built a tiered architecture that distinguishes between ephemeral working memory (session notes), medium-term procedural memory (compaction summaries), and long-term persistent memory (the extracted memory files). The explicit prohibition against saving derivable information — and the instruction to defer to git history or the codebase itself — reflects an emerging design principle in agentic AI systems: persistent memory should store what cannot be cheaply reconstructed, not everything that might be useful.

The broader significance of this architecture lies in what it reveals about the engineering challenges of deploying capable AI coding agents in real-world development environments. As Claude Code operates on increasingly complex, multi-session projects, the ability to maintain coherent state across context boundaries becomes a competitive differentiator. The multi-agent framing — where a compaction agent, a memory extraction subagent, and a session memory updater each operate with constrained tool access and defined turn budgets — mirrors architectural patterns emerging across the AI industry for managing reliability and resource efficiency in agentic pipelines. Anthropic's approach treats context management not as a background concern but as a first-class system design problem, with structured prompts, hard token budgets, and failure circuit breakers that together aim to make long-running agentic sessions both recoverable and auditable.

Read original article →

Detailed Analysis

Don't Miss a Deploy