Anthropic gave Claude $100 to go shopping, here’s what the AI ended up buying - Storyboard18

Anthropic gave Claude $100 to go shopping, here’s what the AI ended up buying Storyboard18 [truncated: Google News RSS provides only a snippet, not full article

Detailed Analysis

Anthropic's internal experiment known as "Project Deal," conducted in December 2025, placed Claude AI agents in a fully autonomous commercial environment, each equipped with a $100 budget and tasked with buying and selling items within a Slack-based marketplace among 69 Anthropic employees. After receiving just a 10-minute preference briefing from each participating employee, Claude agents independently managed the entire transaction lifecycle — drafting listings, conducting negotiations, issuing and responding to counteroffers, and closing deals — without any further human involvement. The experiment generated 186 completed transactions totaling over $4,000, with a median item price of $12, and included a diverse range of goods from snowboards to 19 ping-pong balls purchased for $3 by one agent as a self-directed "gift."

The results revealed meaningful performance stratification across Claude model versions. More advanced Claude models consistently outperformed lighter variants, securing substantially better outcomes on both the buying and selling sides — achieving sale prices as high as $65 for items that lighter models sold for just $38. One particularly striking outcome was an agent purchasing a duplicate snowboard that precisely matched the owner's existing board, demonstrating that Claude's preference modeling from minimal input could be accurate enough to produce genuinely personalized decisions. Notably, negotiation style had only marginal impact on outcomes, with aggressive tactics yielding roughly $6 more per item, attributable primarily to higher initial listing prices rather than superior bargaining execution during the negotiation itself.

The experiment sits within a broader pattern of Anthropic stress-testing agentic AI capabilities in real-world economic contexts. A separate internal project, "Project Vend," deployed Claude Sonnet 3.7 to manage an office shop and exposed different limitations — the model struggled with tasks such as resisting steep discounts or declining high-profit offers that conflicted with user preferences, illustrating that commercial agency introduces nuanced alignment challenges beyond raw transactional competence. Together, these experiments suggest Anthropic is systematically mapping the boundaries of where agentic Claude performs reliably and where human oversight remains essential, particularly in scenarios involving financial judgment, preference inference, and multi-party negotiation under ambiguous instructions.

The broader significance of Project Deal lies in what it reveals about the near-term commercial trajectory of agentic AI. Nearly half of the employee participants indicated willingness to pay for a similar AI shopping service, suggesting latent consumer demand even at the current capability level. This positions autonomous AI agents not as speculative future tools but as products approaching market readiness, at least for bounded, preference-driven transactional tasks. As AI labs race to deploy agents in more complex economic environments, Anthropic's controlled internal experiments represent a deliberate, evidence-based approach to capability validation — one that acknowledges both the promise and the friction points before committing to public deployment at scale. The combination of high transaction volume, accurate preference modeling, and model-tier performance differentiation provides Anthropic with granular data to refine agentic Claude for consumer and enterprise applications alike.

Read original article →

Detailed Analysis

Don't Miss a Deploy