Detailed Analysis
Claude Code's effort levels (Low, Medium, High) and the Extended Thinking toggle found in Claude's web interface are related but technically distinct mechanisms for controlling how Claude allocates its reasoning resources. The effort parameter, introduced for Claude Opus 4.6 and Sonnet 4.6, is a unified control that governs total token expenditure across all components of a response — text output, tool calls, and internal reasoning alike. Extended Thinking, by contrast, is a specific feature that enables step-by-step reasoning rendered in dedicated `thinking` blocks before Claude produces its final answer. While the two systems interact closely, equating Low effort with "thinking off" and High effort with "Extended Thinking on" is an oversimplification that obscures meaningful architectural differences.
The effort parameter was designed to replace the now-deprecated `budget_tokens` configuration as the recommended mechanism for tuning thinking depth when using adaptive thinking mode (`thinking: {type: "adaptive"}`). At higher effort levels, Claude almost always engages in deep reasoning; at lower effort levels, it may skip thinking entirely for tasks it determines to be straightforward. This adaptive behavior is key: effort levels do not simply toggle extended thinking on or off, but rather calibrate the likelihood and depth of reasoning dynamically based on perceived task complexity. Lower effort settings also reduce token consumption across tool calls and text generation — making them useful for cost and latency optimization in agentic workflows, not just for controlling reasoning depth.
Extended Thinking, in its legacy form, relied on manual configuration with explicit `budget_tokens` values, with Anthropic recommending between 10,000 and 50,000 tokens for complex tasks. This manual mode is now deprecated on 4.6-generation models in favor of the adaptive approach governed by the effort parameter. The visible output of Extended Thinking — the `thinking` blocks that appear before the final response — remains a transparency feature that gives developers and users insight into Claude's reasoning chain, particularly valuable for mathematics, complex coding problems, and multi-step analysis tasks. The effort system effectively subsumes the functionality of manual `budget_tokens` configuration while adding broader scope over the entire response.
The confusion surfaced in this Reddit thread reflects a broader challenge in how Anthropic surfaces technical distinctions across different product surfaces. Claude Code, as a developer-facing CLI tool, exposes the effort parameter through its Low/Medium/High interface, while the web UI presents a simpler Extended Thinking toggle that abstracts away the underlying configuration. These different UI paradigms map onto the same underlying API machinery but communicate it through different conceptual frameworks, leading users to reasonably — if incorrectly — assume they are equivalent controls. Anthropic's documentation on the effort parameter and extended thinking exists across multiple separate pages, contributing to the fragmentation of understanding.
This distinction matters significantly as Anthropic pushes further into agentic and tool-heavy use cases. The effort parameter's ability to throttle not just thinking but also tool call frequency makes it a critical lever for managing the cost and latency of autonomous agents that may invoke dozens of tools in a single session. The deprecation of manual `budget_tokens` in favor of adaptive effort-based control signals Anthropic's broader architectural direction: moving away from static, manually-tuned reasoning budgets toward dynamic systems that allocate cognitive resources based on task demands. For developers building on top of Claude Code or the API, understanding this distinction is essential to optimizing both performance and cost in production deployments.
Read original article →