PSA: How to save tokens when using dynamic workflows

Dynamic workflows in Claude default to running all subagents on the same expensive model as the main session, resulting in significant token consumption and inflated costs. This behavior can be optimized by routing planning, strategy, and orchestration tasks to Opus while directing implementation work to cheaper models like Sonnet. A specific prompt structure or the use of a prompt improver hook can automate this token-efficient routing.

Detailed Analysis

A community-generated advisory circulating on Reddit's r/ClaudeAI addresses a widespread and costly misunderstanding about how Claude's dynamic workflow system allocates model resources across multi-agent pipelines. The post observes that users are inadvertently incurring large token bills because the default behavior of dynamic workflows assigns every subagent the same model as the primary session. When a user initiates a session using a premium model such as Opus, every one of the potentially dozens or hundreds of spawned subagents also runs on Opus, multiplying costs dramatically. The author clarifies this is working as documented, not a bug, and proposes a straightforward mitigation: explicitly route high-reasoning tasks like planning and orchestration to Opus while delegating implementation-level work to the less expensive Sonnet model.

The practical guidance centers on deliberate model tiering within multi-agent architectures. By structuring prompts to invoke plan mode prior to execution and explicitly naming which model handles which stage of a workflow, users can preserve the quality benefits of frontier models while avoiding their cost at scale. The author also references an open-source "prompt improver hook" on GitHub designed to automate this routing behavior, suggesting a growing ecosystem of community-built tooling aimed at making Claude's agentic capabilities more cost-manageable in practice.

The issue reflects a broader challenge inherent to multi-agent AI systems: the economic implications of architectural defaults are not always intuitive to end users. As orchestration frameworks become more powerful and capable of spinning up large numbers of concurrent agents, the cost surface expands nonlinearly in ways that differ fundamentally from single-turn or even multi-turn conversations. Users accustomed to chatting with Claude in a standard interface are entering agentic workflows without a mental model calibrated to parallel compute expenditure.

This pattern connects directly to a wider trend in the AI industry around cost-performance optimization as agentic deployments scale. Model providers including Anthropic have increasingly offered tiered model families — Opus, Sonnet, Haiku — precisely to enable developers and power users to match model capability to task complexity. The challenge, as this advisory highlights, is that the tooling does not yet enforce or surface these tradeoffs by default, leaving the optimization burden on the user. As dynamic workflows become a standard paradigm rather than an advanced feature, pressure will likely mount on platforms to implement smarter default routing or at minimum more prominent cost-awareness mechanisms at the session level.

The community response to this kind of issue — user-authored PSAs, open-source hooks, and shared prompt templates — illustrates how a technically sophisticated user base has begun developing informal governance around cost and efficiency practices that the official tooling has not yet systematized. This collaborative knowledge-sharing fills a gap between Anthropic's documented capabilities and the operational realities users encounter when deploying those capabilities at scale, and it presages the kind of best-practice documentation and guardrails that mature agentic platforms will need to incorporate natively.

Read original article →

Detailed Analysis

Don't Miss a Deploy