4.8 Is a D1 Yapper — Claude Learning Daily

In a comparison of Claude 4.6 and 4.8 on a complex planning task, 4.6 used extended thinking without displaying its reasoning process and produced a widget-based solution, whereas 4.8 printed its thinking line by line and delivered output in table format. The behavioral difference was noted as surprising given that output tokens represent the highest cost.

Detailed Analysis

A Reddit user in the r/Anthropic community has flagged an apparent behavioral divergence between Claude 4.6 and Claude 4.8, specifically around how each model handles internal reasoning during complex planning tasks. According to the post, Claude 4.6 performed its reasoning silently — working through the problem without surfacing intermediate thoughts — and then delivered a polished, interactive widget as its final output. Claude 4.8, by contrast, appeared to stream its reasoning process line by line directly into the visible response, ultimately producing a standard table rather than the more refined output format achieved by its predecessor. The user characterizes this as "odd" behavior, particularly given that output tokens are the most cost-intensive component of API usage.

The distinction being described maps onto a longstanding technical feature in Anthropic's model lineup: extended or visible thinking, which was introduced as an optional capability with Claude 3.7 Sonnet and carried forward in subsequent releases. When enabled — or when a model defaults to surfacing its reasoning — the chain-of-thought process becomes part of the generated output rather than remaining hidden in internal computation. The user's observation suggests that Claude 4.8 may default to, or more aggressively invoke, visible thinking in a way that Claude 4.6 does not, even when the user has not explicitly requested that behavior. This would represent a meaningful change in default model behavior rather than a capability difference per se.

The cost concern raised in the post is substantive. Anthropic bills output tokens at a significantly higher rate than input tokens across its model tiers, and thinking tokens — when surfaced — count toward output billing. A model that routinely externalizes its reasoning process on complex tasks could therefore generate unexpectedly high costs for developers and users who assumed silent reasoning behavior, particularly in agentic or planning-heavy workflows where extended deliberation is common. If Claude 4.8 is systematically more verbose in this way by default, it could affect both the economics and the user experience of applications built on top of it.

This observation connects to a broader tension in the development of reasoning-capable AI models. There is genuine utility in exposing chain-of-thought output — it aids interpretability, allows users to verify the model's logic, and can improve trust in high-stakes decisions. However, it also increases latency, cost, and interface complexity, particularly in consumer-facing contexts where clean, direct output is expected. The difference in output format — a functional widget from 4.6 versus a plain table from 4.8 — further suggests that the models may differ not just in reasoning verbosity but in how they conceptualize the appropriate presentation of results, which points to meaningful variation in instruction-following and output formatting behavior between minor version increments of the same model family.

Read original article →

Detailed Analysis

Don't Miss a Deploy