Detailed Analysis
A Reddit user posting to r/ClaudeAI reports an encounter with Claude Sonnet 4.6 in which the model confidently provided factually incorrect information about its own tooling, only reversing course after the user challenged the initial response. The interaction centered on a straightforward question about Claude Code's local chat history storage: the model first asserted that "Claude Code chat history is not persisted or retrievable — this is a known limitation of the tool," then, upon being pushed back by the user who suspected a folder rename had simply obscured the files, correctly identified that conversation logs are stored locally at `C:\Users\username\.claude\projects\`. The model subsequently acknowledged the error directly, stating it "was wrong to say chat history 'is not persisted or retrievable'" and apologizing for the misleading initial response. The user is running the model at high effort settings and frames the incident not as an isolated mistake but as part of a broader pattern of what they describe as increasing inaccuracy on simple questions over time.
The significance of this incident lies less in the specific factual error and more in what it reveals about the model's failure mode. Claude did not express uncertainty or hedge its initial answer — it delivered a confidently wrong claim with the rhetorical framing of institutional knowledge ("this is a known limitation of the tool"), even pointing the user toward alternative workarounds that implied the correct answer was impossible. This is a well-documented failure pattern in large language models sometimes called "sycophantic confidence" or hallucination under certainty, where a model presents incorrect information with the same authoritative register it uses for verified facts. The model's ability to self-correct upon light user pressure — without any new information being provided — suggests the correct answer was accessible within its knowledge but was overridden in the initial response by some other inference pattern.
The user's broader claim, that Claude Sonnet 4.6 has been giving "many more incorrect answers to simple questions" over time, touches on a persistent and contested debate within AI user communities regarding model degradation. Users have periodically reported perceived performance dips following model updates or infrastructure changes, though such reports are difficult to validate systematically given the lack of personal performance baselines and the natural variability in LLM outputs. Anthropic has not publicly acknowledged systematic degradation in Sonnet 4.6, and the phenomenon the user describes — where initial wrong answers are corrected under pressure — is consistent with the model's baseline behavior rather than a novel regression. Still, the complaint resonates with a recurring community concern that successive model iterations may optimize certain benchmarks while introducing regressions in practical, conversational reliability.
This incident also highlights a specific vulnerability in AI-assisted developer tooling: when a model is asked about its own ecosystem's technical specifics — file paths, configuration structures, storage behaviors — it may generate plausible-sounding but incorrect answers that users with less domain familiarity would accept without challenge. In this case, the user's prior experience with the tool gave them grounds to push back; a newer user would likely have accepted the initial incorrect response and spent time pursuing dead-end workarounds. As Claude Code and similar agentic development tools expand their user base, the cost of confident misinformation about the tools themselves becomes more consequential, since users are often consulting the model precisely because they lack independent knowledge to verify the answers.
The incident underscores a structural tension in deploying high-confidence language models in technical support contexts. While effort and reasoning settings can improve output quality on complex tasks, they do not inherently eliminate confidently stated factual errors on simpler, more factual queries — and may in some cases reinforce authoritative framing regardless of underlying accuracy. The model's quick self-correction upon light challenge suggests that retrieval-augmented or verification-layered architectures, rather than effort scaling alone, may be necessary to close the gap between perceived and actual reliability in factual domains.
Read original article →