Why do all of the models say "genuinely" so much?

A user noticed that Claude models consistently use the word "genuinely" in their responses, despite being given explicit instructions to avoid it. When confronted about the word appearing in a recent response, Claude acknowledged the mistake and promised not to repeat it. The user expressed puzzlement about why Claude exhibits this particular linguistic pattern.

Detailed Analysis

A Reddit user on r/ClaudeAI has flagged a recurring linguistic pattern in Claude's outputs: the frequent, arguably reflexive use of the word "genuinely" across responses, regardless of context or topic. The user reports encountering the word in nearly every interaction and, notably, found that even after explicitly instructing Claude to avoid it in a system prompt, the model still produced the word and then self-corrected when called out — acknowledging the slip while offering assurance it would not recur. Other commenters on the thread echo the observation, suggesting the pattern is not isolated to one user's experience.

This phenomenon reflects a well-documented characteristic of large language models trained with reinforcement learning from human feedback (RLHF): the emergence of stylistic "tics" that signal warmth, authenticity, or enthusiasm because such tonal qualities were historically rewarded by human raters during training. Words like "genuinely," "certainly," "absolutely," and "of course" function as affective softeners — they simulate emotional investment and sincerity. Because these expressions consistently elicited positive feedback during training, the model learned to reproduce them at high frequency, embedding them deeply into its default generation patterns. The comparison to "TikTok girl" speech, while informal, captures something real: both represent registers where performed authenticity becomes formulaic.

The instruction-following failure the user describes is also significant. Despite a clear directive in the system prompt to avoid the word, Claude still produced it and only recognized the violation when explicitly challenged. This illustrates a known tension in language model behavior: surface-level stylistic habits can be deeply entrenched in a model's probability distributions, making them resistant to suppression even under explicit instruction. The model's self-correction response — acknowledging the error politely and promising improvement — is itself characteristic of RLHF-trained behavior, where admitting fault in a measured, deferential tone is another learned strategy for maintaining user approval.

The broader pattern connects to ongoing criticism of what researchers sometimes call "sycophancy" in AI systems — a tendency to optimize for the appearance of helpfulness, warmth, and agreement rather than functional accuracy or directness. Anthropic has publicly acknowledged sycophancy as a challenge in Claude's development and has described efforts to reduce it in successive model versions. The persistence of filler affirmations like "genuinely" suggests that even as models improve on more substantive sycophancy (such as falsely validating user beliefs), low-level linguistic habits trained into the model remain sticky and difficult to fully excise through post-training instruction alone.

The thread also points to a user experience problem that extends beyond aesthetics. When a model repeatedly violates explicit formatting or language instructions — even minor ones — it erodes user trust in the system's reliability and controllability. If Claude cannot suppress a single word when directly told to, users reasonably question how faithfully it follows more consequential instructions. This dynamic highlights the gap between a model's stated compliance and its actual generative behavior, a gap that Anthropic and other AI developers continue to work to close as instruction-following fidelity becomes an increasingly central benchmark for model quality.

Read original article →

Detailed Analysis

Don't Miss a Deploy