My Cowork has been broken for 48 hours. I dug into the session files and found my Max account is enrolled in a prompt variant "testfoo"?

A user experienced a 48-hour outage with Cowork in which prompts fired incorrect skills and connectors failed to load, despite being functional in Chat mode. Investigation revealed the account was simultaneously enrolled in two A/B test prompt variants, including one named "testfoo" (a developer placeholder), both containing identical directives that caused the skill router to malfunction. An additional configuration error disabled the ToolSearch mechanism needed for deferred tool loading, and while Anthropic's AI support confirmed the issue required manual adjustment of feature flags, that human escalation had not yet occurred.

Detailed Analysis

A Claude Max plan subscriber reports a 48-hour outage of Anthropic's Cowork feature, characterized by systematic failures across multiple integrated tools including Granola, Notion, Figma, and Slack. Through direct inspection of local session JSON files, feature flag caches, and the application's system prompt, the user identified what appears to be a concrete technical root cause: simultaneous enrollment in two A/B testing prompt variants, one named `testfoo` and one named `0526`. The `testfoo` label is particularly notable, as it suggests a developer placeholder that was never meant to reach production users. Both variants contain identical language instructing the model to invoke user skills "promiscuously," and the user's hypothesis is that their concurrent application effectively doubles the weighting of that directive, causing the skill auto-router to trigger on weak keyword signals. Compounding the issue, a `ToolSearch` mechanism required for deferred-tool-loading is simultaneously disabled on the account, resulting in connected integrations that appear active but expose no usable tools within Cowork sessions.

The incident reveals a meaningful gap between Anthropic's internal testing infrastructure and its production safeguards. A/B prompt variant experiments are a standard practice in large-scale AI product development, but the presence of a variant named `testfoo` in a paying user's session suggests that staging and production environments may share flag distribution mechanisms without sufficient gatekeeping. The user also notes that Anthropic's own support AI, Fin, acknowledged the misconfiguration and flagged it for human engineer escalation — yet that escalation had not occurred within the reporting window, pointing to an operational gap in the support-to-engineering handoff process for flag-level account issues. The fact that local cache overrides and file edits are silently overwritten on resync further removes any self-service remediation path, leaving affected users fully dependent on backend intervention.

The broader significance of this report lies in what it exposes about the architectural complexity of AI-native productivity platforms. Cowork represents Anthropic's push into agentic, multi-tool workflow environments where Claude serves as an orchestration layer across third-party services via MCP (Model Context Protocol) connectors. Unlike traditional SaaS products where a broken feature is typically isolated, failures in this architecture propagate across every connected service simultaneously — a single misconfigured feature flag can render an entire productivity stack non-functional. The user's citation of five matching GitHub issues, spanning MCP tool exposure failures, silent connector failures, and marketplace migration race conditions, suggests these are not isolated edge cases but symptoms of systemic instability in the Cowork product layer during a period of active development.

This episode also highlights an emerging accountability challenge for AI platform providers as they move upmarket with premium subscription tiers. The user is on a Max plan, indicating a higher-cost commitment where reliability expectations are correspondingly elevated. The inability to resolve or even acknowledge a backend configuration error within 48 hours, for a user whose entire workflow dependency on the product is documented and escalated, represents a support and reliability posture that may not scale as Anthropic competes with enterprise-oriented platforms from OpenAI, Google, and Microsoft. As agentic AI tools become more deeply embedded in professional workflows, the blast radius of production incidents grows proportionally, and the tolerance for unresolved flag-level bugs affecting paying users diminishes accordingly.

Read original article →

Detailed Analysis

Don't Miss a Deploy