Detailed Analysis
A Max-plan subscriber and content creator with over 1.2 million followers across social platforms documented a ten-day outage of Anthropic's Cowork feature beginning May 26, 2026, in a detailed Reddit post that reveals both the technical mechanics of how Anthropic deploys experimental configurations to production users and the apparent dysfunction in the company's customer support pipeline. The user's cloud connectors — including Notion, Granola, Slack, Gmail, and Figma — appeared connected in the UI but failed silently at runtime, while unrelated skills began auto-firing on loosely matched prompts. Through independent debugging using Claude Code, the user identified that their account had been assigned two simultaneous A/B prompt variants, one labeled `0526` and one labeled `testfoo`, both of which contained instructions directing Claude to invoke installed skills "promiscuously when they seem at all relevant." Applied in duplicate, this instruction produced the observed misfiring behavior. A separate feature flag, `tengu_mcp_retry_failed_remote: false`, combined with `ToolSearch` being disabled in Cowork sessions, rendered every cloud connector unreachable specifically within Cowork while leaving Chat mode and Claude Code terminal sessions functional.
The diagnostic process surfaced a pattern that is technically notable: the broken configuration reproduced on a fresh Anthropic account on a different email address on the same machine, suggesting the variant assignment is tied to device fingerprinting rather than user ID. This has significant implications. If experimental prompt variants and feature flag combinations are being distributed based on hardware or network identity rather than account identity, standard account-level remediation — password resets, account recreation — would fail to resolve the issue, and the blast radius of a broken variant could be substantially wider than internal testing metrics would suggest. The name `testfoo`, a placeholder typically used during early prototyping, appearing in a production variant assignment further suggests the configuration escaped internal staging controls without being renamed or formally reviewed for external deployment.
The support interaction pattern documented across the ten days is arguably as significant as the technical finding. The user filed five support emails and two conversations with Anthropic's Fin AI assistant — which itself diagnosed the issue as requiring a human engineer to adjust feature flags — and received zero direct human replies. Yet account-level changes occurred on days four, six, and ten: the two prompt variants were silently removed in sequence, and skill misfiring visibly improved, indicating that someone within Anthropic's engineering or support organization was actively working the ticket. The user's framing — "someone at Anthropic is working through my ticket, they're just not telling me" — describes a support model in which remediation happens asynchronously through backend adjustments with no corresponding customer communication, a pattern the user corroborated through external reports on Trustpilot and published blog posts.
This case connects to a broader tension in how AI platform companies handle the intersection of rapid experimentation and production user experience. A/B testing of prompt-level configurations is a standard practice for improving model behavior at scale, but the standard framework assumes that individual variant assignments are isolated, recoverable, and monitored for degradation signals. The `testfoo` incident suggests at minimum that Anthropic's variant naming and staging discipline did not prevent a development-phase configuration from reaching a paying user, and that the flag combination it shipped alongside created a compound failure mode not anticipated in isolation. The reproduction on a second account on the same device also raises the question of whether the user's visible complaint represents a much larger silent population experiencing the same degradation without the technical fluency to diagnose it.
The public GitHub issue the user filed on May 29 received appropriate triage labels but no comments after six days, a response pattern consistent with open-source-adjacent issue trackers where labeling serves internal routing but does not constitute engagement. The remaining unresolved issue — a cloud connector flag that the user's own Claude Code diagnosis identifies as a single flag flip — illustrates a structural problem that scales poorly: technically sophisticated users can accurately identify the fix, file it through every available channel, and still be blocked entirely by the absence of a human escalation path. As Anthropic continues expanding Cowork and MCP-based integrations as core workflow tools for paid subscribers, the support infrastructure's inability to provide synchronous human communication for broken production configurations on its highest-tier plan represents a meaningful gap between product ambition and operational readiness.
Read original article →