Detailed Analysis
A developer analyzing 129 Claude Code session transcripts has surfaced a granular breakdown of how output tokens are distributed across task types, revealing a workflow dominated by reasoning and dialogue at 58% of total token consumption, with comparatively marginal shares going to web search (0.3%), code editing (1.4%), and agent dispatch (0.7%). The analysis was conducted using a roughly 50-line Python script that iterates through JSONL session files stored in `~/.claude/projects/<project-hash>/`, reads per-turn token counts from the `usage` field, identifies the dominant tool type from `tool_use` blocks in the `content` array, and buckets turns without tool calls as reasoning and dialogue. The result is a rare empirical look at how a power user's Claude Code consumption actually breaks down at the token level — not how they assume it breaks down.
The most diagnostically interesting finding is the gap between the author's self-reported workflow and the token data. The stated ambition to "research prior work first" is flatly contradicted by a 0.3% web search allocation, suggesting that grounding in external sources is either happening through other means or simply not occurring with the frequency the user assumes. This kind of discrepancy — between perceived and actual behavior — is a known challenge in any knowledge workflow, but Claude Code's structured JSONL transcripts make it unusually tractable to audit. The 58% reasoning and dialogue figure is harder to evaluate in isolation; it could reflect genuinely productive deliberation or it could indicate that the user is engaging the model conversationally when they should be dispatching discrete tasks.
The low agent dispatch figure of 0.7% is worth particular attention in the context of how agentic AI tooling is designed to be used. Claude Code is explicitly built to support multi-agent orchestration, with subagents handling parallelizable or specialized workloads while a primary session maintains high-level coordination. A dispatch rate that low, combined with a dominant reasoning-and-dialogue bucket, suggests the user may be concentrating work in a single session context that could be more efficiently decomposed. The 1.4% code editing figure adds a further dimension: if the tool is being used heavily for thought but lightly for direct artifact production, the question of whether that ratio is efficient or wasteful depends entirely on whether the dialogue is upstream of high-quality outputs produced elsewhere.
This kind of self-auditing methodology points toward a broader emerging practice in AI-assisted development: using the model's own session artifacts to evaluate and improve how the model is being used. As Claude Code and similar agentic coding environments accumulate rich structured logs, developers gain access to a new layer of workflow observability that did not exist with traditional IDEs or earlier chatbot interfaces. The author's Python script is a lightweight prototype of what could eventually be a first-class analytics feature — one that surfaces inefficiencies in tool-use patterns, highlights over-reliance on conversational interaction, and nudges users toward more deliberate orchestration strategies. The fact that this analysis emerged organically from a single user's curiosity rather than from Anthropic tooling underscores how much latent signal is already embedded in Claude Code's output, waiting to be read.
Read original article →