← Reddit

Model: opus now defaults to 1M context - how to limit MAX_TOKENS for subagents & team agents?

Reddit · farono · April 23, 2026
Claude Code recently changed the opus model default from 200k to 1M context window. A developer working with team agents found this extension problematic for the team lead coordinator role and could not use the `CLAUDE_CODE_MAX_OUTPUT_TOKENS` environment variable to limit only the coordinator without affecting other team agents. The developer sought guidance on implementing agent-specific maximum context length configurations.

Detailed Analysis

Claude Code's recent update to default the `opus` model to a 1 million token context window — up from its previous 200,000 token ceiling — has introduced a practical configuration challenge for developers building multi-agent systems. The Reddit post in question highlights a specific friction point: when a user defines a team-lead subagent via Claude Code's frontmatter system using `model: opus`, that agent now silently inherits the 1M context default. The developer's core concern is not merely about cost, but about architectural control — specifically, that a coordinator agent does not need or benefit from a massive context window, and that the existing environment variable `CLAUDE_CODE_MAX_OUTPUT_TOKENS` applies globally across all agents in a session, making it unsuitable for per-agent tuning.

The technical solution centers on the `max_tokens` API parameter, which can be applied at the individual agent-call level rather than as a session-wide environment variable. By setting `max_tokens` directly in API requests for each subagent or team agent, developers can cap output token generation per completion without affecting the behavior of sibling or parent agents. For coordinator or team-lead agents, values in the range of 4,000 to 8,000 tokens are generally sufficient for orchestration tasks, while subagents handling heavier analytical workloads can be independently configured with higher ceilings. This granular control is especially relevant in Claude Code's team agents architecture, where agents are spawned with distinct roles and different context requirements, and where one-size-fits-all token limits can either waste resources or artificially constrain specialized agents.

The shift to 1M context as a default reflects Anthropic's broader strategy to position Claude Opus as a long-horizon reasoning and agent-orchestration model capable of ingesting entire codebases, long document chains, or extended agent memory in a single prompt. Claude Opus 4.6 introduced the 1M context alongside 128,000-token output support — a pairing designed specifically to enable agents to both read and generate at scale without splitting work across multiple requests. However, this architectural ambition creates a mismatch for developers who want lean, fast, cost-efficient coordinator agents that do little heavy lifting and primarily route tasks. Auto-compaction, which summarizes conversation history when context limits approach, triggers less frequently at 1M than at 200K, which may actually reduce some overhead — but it also means that context bloat can accumulate further before being addressed.

The broader trend this post reflects is the growing complexity of managing multi-agent systems as frontier AI models acquire more capable defaults. As Anthropic and other labs extend context windows and increase default output capacities, developers face an inversion of the traditional problem: rather than fighting for more tokens, they must now deliberately constrain them to optimize for latency, cost, and architectural clarity. The lack of per-agent context controls baked into Claude Code's frontmatter syntax — as opposed to relying on API-level parameters or prompt engineering tricks like `task_budget` instructions — represents a gap that will likely need to be addressed as agentic workflows become more sophisticated. For now, the recommended path combines explicit `max_tokens` settings at the API call level with careful agent role design to ensure that context capacity is matched to task complexity rather than inherited as a default.

Read original article →