Track cost and usage - Claude Code Docs

The Claude Agent SDK provides detailed token usage information and cost tracking for each interaction with Claude, with TypeScript and Python SDKs exposing the same data through different field naming conventions. The guide explains how to track total costs per query call, break down usage by individual steps and models, and leverage the SDK's automatic prompt caching to reduce expenses. Cost data is available on both successful and failed result messages, and developers should accumulate costs across multiple query calls since the SDK reports costs per individual call rather than at the session level.

Detailed Analysis

Anthropic's Claude Agent SDK documentation outlines a comprehensive framework for tracking token usage and costs across interactions with Claude, addressing a critical operational need for developers building agentic applications. The SDK exposes usage data at multiple levels of granularity: per-step token counts embedded in individual assistant messages, per-model breakdowns accessible via the `modelUsage` (TypeScript) or `model_usage` (Python) fields on result messages, and cumulative cost totals provided through `total_cost_usd` at the conclusion of each `query()` call. Both the TypeScript and Python SDKs surface identical underlying cost data, differing only in field naming conventions — a design choice that reduces conceptual overhead while preserving language-idiomatic interfaces. Critically, the SDK does not aggregate costs across multiple `query()` calls in a session, placing the responsibility for session-level and multi-user cost accumulation squarely on the implementing developer.

A technically notable aspect of the SDK's design concerns the handling of parallel tool execution. When Claude invokes multiple tools simultaneously within a single step, the resulting assistant messages share identical message IDs and usage objects. Without deduplication, naive summation of token counts across these messages would produce inflated totals. The documentation's recommended pattern — maintaining a set of seen message IDs and skipping duplicates — reflects a deliberate architectural trade-off in how the SDK surfaces parallelism to the developer. Similarly, the documentation acknowledges that output token counts can occasionally diverge across messages sharing the same ID, advising developers to use the highest observed value and to treat `total_cost_usd` from result messages as more reliable than manually summed per-step figures. Even so, the documentation explicitly characterizes these figures as estimates that may differ from actual billing, directing authoritative cost verification to the Claude Console.

The broader cost landscape for Claude Code, as reflected in supplementary research, reveals the significance of robust tracking tooling at enterprise scale. Average costs run approximately $13 per developer per active day, with enterprise deployments typically landing between $150 and $250 per developer per month — figures that make granular tracking not merely a developer convenience but a financial governance imperative. The Usage & Cost API extends tracking capabilities beyond individual SDK calls, enabling workspace-level aggregation, daily reporting, budget alerting, and cache efficiency analysis. The availability of third-party tooling such as the `ccusage` CLI — which parses local JSONL files generated by Claude Code — further illustrates the demand for flexible, multi-layer cost visibility across different deployment contexts and organizational structures.

The documentation's per-model breakdown feature carries particular strategic weight in the context of multi-agent architectures. As agentic systems increasingly orchestrate combinations of models — routing computationally intensive reasoning tasks to Claude Opus while delegating simpler subtasks to Claude Haiku — the ability to attribute token consumption to specific models becomes essential for optimizing cost-to-performance ratios. With Opus 4 priced at $15 per million input tokens and $75 per million output tokens versus Haiku 3.5 at $0.80 and $4 respectively, targeted model selection for subagents can yield order-of-magnitude cost reductions. The `modelUsage` map directly enables this optimization loop by making per-model cost attribution a first-class capability of the SDK rather than an afterthought requiring external instrumentation.

This documentation reflects a broader trend across the AI industry toward treating cost observability as a core engineering concern rather than a peripheral billing detail. As LLM-powered applications move from prototypes to production systems handling thousands of concurrent users and complex multi-step workflows, the infrastructure for tracking, attributing, and governing token expenditure becomes as critical as performance monitoring or error handling. Anthropic's decision to embed detailed usage data directly into the SDK's message stream — rather than relying solely on external dashboards — aligns with the emerging principle that cost signals should be available at the same granularity as functional outputs, enabling real-time control loops, automated budget enforcement, and fine-grained architectural optimization within agent-based systems.

Read original article →

Detailed Analysis

Don't Miss a Deploy