TIL: `opusplan` can burn MORE context than full Opus on large tasks (and why)

Opusplan consumes more context tokens than plain Opus on large tasks because its planning phase is capped at 200K context while plain Opus accesses the full 1M window, triggering earlier compaction that increases token costs. The accumulated history persists through the execution phase where Sonnet processes the full context, perpetuating the inefficiency. A more efficient approach separates planning from execution: using Opus to generate a plan document in one session, then feeding that plan to Sonnet or Opus in a fresh session for execution without the compaction overhead.

Detailed Analysis

A counterintuitive behavior in Claude Code's `opusplan` mode has surfaced among power users on the Max plan: the hybrid planning mode, ostensibly designed to conserve tokens by pairing Opus's reasoning with Sonnet's execution, can paradoxically consume *more* context than simply using full Claude Opus end-to-end on large, complex tasks. The root cause lies in a structural asymmetry between the two configurations. The `opusplan` mode caps its planning phase at a 200K context window, which means compaction — the automatic process of summarizing older conversation segments to free up space — triggers far earlier during that planning phase than it would under full Opus, which on Max, Team, and Enterprise plans enjoys a 1 million token context window. Compaction is not free: the summarization process itself consumes tokens. Because the accumulated session history is then passed in full to Sonnet during the execution phase regardless, users end up paying the compaction cost without escaping the downstream context bloat, negating the efficiency rationale for the mode on large inputs.

The broader mechanics of why this happens connect to how context windows interact with user behavior and model architecture. Research context surrounding the article notes that larger nominal context capacity tends to encourage users to load more data indiscriminately — entire codebases, sprawling deployment configurations, verbose histories — under the assumption that the model can productively "digest" everything present. This assumption breaks down in practice: token capacity and token utility are not the same thing, and models reasoning over massive, unfocused contexts burn tokens rapidly without proportional gains in output quality. In `opusplan`, this dynamic is compounded because Opus is specifically handling the high-token-cost reasoning phase, and the 200K ceiling means that ceiling is reached faster on large technical tasks like cloud deployments, triggering compaction spirals that degrade both efficiency and continuity.

The workaround identified by the original poster reflects a principled re-architecture of how Claude Code sessions are structured: use Opus in a dedicated Session 1 solely for planning and persist the output as an external document artifact, then open a clean Session 2 — with Sonnet or Opus — and supply only the plan document alongside whatever targeted context is needed for execution. This pattern eliminates compaction cascades entirely by keeping each session's token load focused and bounded. It also produces a reusable artifact that can seed multiple independent execution sessions, which is particularly valuable in iterative deployment workflows where the same architectural plan might be executed across different environments or revisited after changes.

The practical implication for Claude Code users on higher-tier plans is that `opusplan` occupies a narrower optimal-use band than its marketing position might suggest. It performs well on self-contained tasks that comfortably fit within 200K tokens — focused feature implementations, contained refactors, discrete debugging sessions. For large-scale infrastructure work, multi-file architectural changes, or any task where accumulated context is likely to grow significantly during execution, the mode's design constraints actively work against the efficiency it promises. This is consistent with a broader pattern in large language model tooling where hybrid or "smart" routing modes introduce hidden tradeoffs that only become visible at scale, often surprising users who adopted the mode precisely because they anticipated higher resource demands.

The episode also highlights a maturation challenge facing agentic coding tools as they evolve toward longer-horizon autonomous workflows. As tasks grow in complexity — multi-step cloud deployments, large codebase refactors, extended planning cycles — the interaction between context window limits, compaction mechanisms, model routing logic, and session state management becomes a first-order engineering concern for users, not just for the underlying platform. Anthropic's subsequent improvements to long-context handling in Opus 4.7 over 4.6 suggest the company is aware of these pressure points, but the `opusplan` ceiling issue illustrates that architectural defaults optimized for common cases can create significant friction at the edge cases that technically sophisticated users disproportionately encounter.

Read original article →

Detailed Analysis

Don't Miss a Deploy