Careful with the new UltraCode, it's a mega token eater, and it's buggy. ~1.7 million tokens used with no output. There are no refunds for this.

A user experienced significant issues with the new UltraCode feature when subagents consumed approximately 1.7 million tokens without producing functional output, resulting in failed agents and a $18 charge with no refunds offered. Despite attempts at output caching, all eight agents were redeployed and consumed additional tokens, ultimately generating only a 12,000-word document with no actual code or work completed. The user warned that the system lacks supervisory safeguards to detect degenerate behavior and failure modes, leaving users vulnerable to substantial costs for minimal or no results.

Detailed Analysis

Anthropic's newly released UltraCode feature, a multi-agent swarm system built on Claude, is drawing criticism from early users after reports of runaway token consumption, degenerate agent loops, and zero productive output despite significant credit expenditure. In the incident described, eight subagents were deployed simultaneously to work on a coding repository, collectively consuming approximately 1.7 million tokens within minutes before one agent entered a broken loop state. When the orchestrating Claude agent attempted to recover by caching the outputs of the seven functioning agents and redeploying only the failed one, the cache operation silently failed, causing all eight agents to redeploy and consume another round of tokens. After roughly an hour and approximately 2 million total tokens consumed, the entire run produced only a 12,000-word summary document — no code written, no specified tasks completed — at a cost of approximately $18 in API credits.

The user's frustration is compounded not just by the financial loss but by Anthropic's refund policy, which the platform's customer service bot explicitly stated holds the company not responsible for degraded service and offers no credit refunds under any circumstances, including cases where the fault lies with Anthropic's own systems. While $18 is a modest sum in absolute terms, the incident illustrates a meaningful risk-surface problem: multi-agent architectures that can autonomously spawn subagents and consume tokens at high velocity lack adequate supervisory mechanisms to detect and terminate degenerate behavior before costs spiral. The absence of circuit-breaker logic, spend caps, or loop-detection safeguards in what appears to be an early-stage production feature represents a gap between feature ambition and operational robustness.

The incident fits into a broader pattern in the AI industry where agentic and multi-agent features are being rushed to market with insufficient guardrails around resource consumption and failure modes. Multi-agent orchestration introduces compounding failure risks that single-model interactions do not carry — one malfunctioning subagent can trigger cascading redeployments, especially when caching and state management are unreliable. Anthropic's Claude Agent SDK and related tooling have been positioning the company as a serious player in the agentic AI space, competing directly with OpenAI's operator ecosystem and Google's Gemini agents, but incidents like this highlight the operational maturity gap that remains between demonstrating agentic capability and deploying it safely at scale.

The lack of a supervisory meta-agent capable of identifying pathological behavior — a component that would seem elementary given that Anthropic's own research division has published extensively on AI safety and robustness — is particularly notable. Users engaging with swarm-style features reasonably expect that the orchestrating system can recognize when subagents are in infinite loops or producing no meaningful output, and can halt execution before costs accumulate. That this basic safeguard was absent in a publicly released product suggests either the feature shipped prematurely or the engineering priorities did not account for real-world failure edge cases that were, as the original poster notes, "extremely obvious." For developers and enterprises evaluating Anthropic's agentic tooling for production workflows, this episode underscores the importance of implementing external spend monitoring and hard token-consumption limits independently of whatever guardrails the platform itself provides.

Read original article →

Detailed Analysis

Don't Miss a Deploy