Does Claude fail the Sally-Anne test for you sometimes?

Claude sometimes generates copywriting that assumes readers possess context from the conversation, making the output incomprehensible to external audiences. This behavior has been reported with Sonnet 4.6 and Opus 4.6 versions, suggesting a potential limitation in maintaining appropriate context awareness for different reader perspectives.

Detailed Analysis

A Reddit user on r/Anthropic raises a practically observed phenomenon: Claude, when used for copywriting tasks, sometimes produces content that presupposes the reader shares the conversational context of the prompt session — writing as though the end audience has access to all the background information the user provided to the model. The user reports this occurring most frequently with Claude Sonnet 4.6 and Opus 4.6, and frames it through the lens of the Sally-Anne test, a classic false-belief task from developmental psychology. In the original Sally-Anne paradigm, a subject must correctly predict where Sally will look for a moved object based on Sally's outdated belief — not on what the subject themselves knows to be true. The user's implication is that Claude is failing an analogous task: it knows what the user knows, but fails to model what the *reader* — an outside party — does not know.

The Sally-Anne test was developed in the 1980s to assess theory of mind (ToM), the cognitive ability to attribute mental states to others. It became foundational in autism research, where failures were historically interpreted as deficits in mentalizing. However, contemporary research has complicated this picture considerably, noting that failures may stem from linguistic ambiguity, attention demands, or even the structural cues of how questions are posed — rather than a categorical absence of theory of mind. This nuance matters when applied to large language models like Claude. Research context indicates that LLMs generally *pass* explicit Sally-Anne-style prompts, correctly identifying that Sally will look in her basket based on her false belief. But this passing is understood to reflect pattern recognition from training data rather than genuine belief attribution — a distinction that becomes critically relevant in applied, open-ended tasks like copywriting.

The gap between Claude passing a structured false-belief task and failing an implicit one in copywriting reveals something important about how context-sensitivity breaks down in practice. When a user spends a conversation elaborating on a product, a campaign brief, or a brand's backstory, that accumulated information saturates the model's working context. When Claude then generates copy for a general audience, it may fail to adequately model the epistemic state of a reader who has none of that context — essentially writing from inside the conversation rather than outside it. This is not a failure of explicit theory of mind reasoning but a failure of *spontaneous perspective-taking* in generative tasks, a distinction that cognitive scientists also observe in human performance: people often pass formal ToM tests while still defaulting to egocentric assumptions under cognitive load.

This phenomenon connects to broader challenges in deploying LLMs for professional communication tasks. Effective copywriting fundamentally requires holding two distinct knowledge states simultaneously: what the writer knows and what the audience knows. The more context a user provides to Claude in a session, the more the model's output risks being calibrated to that enriched internal state rather than to the zero-context reader. This is an emergent failure mode of the very feature — long-context retention and coherence — that makes Claude useful for complex tasks. It suggests that prompting strategies explicitly instructing Claude to "write for a reader who knows nothing about this conversation" may be necessary scaffolding, and that model developers may need to consider how perspective-switching instructions are weighted in copywriting-specific fine-tuning or system prompt design.

More broadly, the Reddit post illustrates the gap between benchmark performance and real-world utility in AI systems. Claude's ability to answer "Where does Sally think the ball is?" correctly tells relatively little about whether it will spontaneously adopt an outsider's epistemic vantage point while generating marketing copy across dozens of conversational turns. As AI models are increasingly integrated into professional workflows — particularly in content creation, customer communication, and marketing — this kind of tacit perspective-taking failure becomes a meaningful quality issue. It points toward a research and product design challenge: moving beyond measuring ToM as a discrete, elicitable capability and toward measuring it as a consistently applied, context-sensitive behavior embedded in open-ended generation.

Read original article →

Detailed Analysis

Don't Miss a Deploy