Detailed Analysis
A Reddit user operating under a Max subscription on iOS has documented an investigation into how Anthropic's Claude handles Project Instructions — the customizable directives users can set to shape Claude's behavior across a project's conversations. Through a methodical approach of prompting Claude to reproduce its full system prompt verbatim and comparing outputs between project and non-project conversations, the user determined that Project Instructions are not dynamically re-injected at each conversational turn. Instead, they are loaded once into the system prompt at the moment a conversation begins and remain statically embedded in that context window for the duration of the session.
The more consequential finding concerns what occurs when a user modifies Project Instructions while a conversation is already underway. Because the instructions exist within the context window as a fixed artifact from session initialization, Claude does not receive a notification or signal that any change has occurred. When the updated instructions are rendered, Claude processes them as though they had always been present from the conversation's very first message. This produces a demonstrably disorienting result: if Claude previously followed one set of instructions and is then shown a different set retroactively occupying that same contextual position, it will interpret its earlier behavior as an error — concluding it failed to follow instructions that, in reality, did not exist at the time of that earlier response. The model, in effect, rewrites its own behavioral history to conform to the current state of the prompt.
A further finding is that Project Instructions carry no identifying label or metadata within the system prompt itself. Claude receives and follows the instructions as plain contextual content, but because nothing in the prompt explicitly marks them as "Project Instructions," querying Claude about its project instructions may yield the response that none exist. The model cannot introspect the source or categorization of its own system prompt contents — it can only act on what the text says, not on how Anthropic's infrastructure categorizes it. This reveals a meaningful gap between Claude's functional behavior and its capacity for accurate self-reporting about that behavior.
These findings illuminate a broader architectural reality of large language model deployments: the system prompt is a static document from the model's perspective, and the model has no privileged awareness of the external infrastructure that assembles or modifies it. This is not a bug unique to Claude — it reflects the fundamental design of transformer-based models, which process tokens without inherent awareness of their provenance. What makes this finding particularly relevant to Claude users is the practical implication for workflow design: mid-conversation instruction changes do not produce a clean state reset but instead create a kind of temporal inconsistency, where the model's apparent memory of earlier behavior is retroactively recontextualized by the new prompt state.
The methodology the user employed — prompting for verbatim system prompt reproduction and diffing the outputs — represents an accessible form of behavioral auditing that sits at the intersection of power-user practice and informal AI interpretability research. That a non-developer Max subscriber was able to reverse-engineer meaningful architectural details through careful prompt experimentation speaks to both the transparency risks and the investigative potential inherent in conversational AI systems. As Anthropic continues to develop features like Projects and persistent instructions, the disconnect between what the infrastructure does and what the model can accurately report about its own context window will likely remain an area of user scrutiny and, potentially, future product refinement.
Read original article →