Anthropic runs Claude-powered employee marketplace experiment - Let's Data Science

Anthropic runs Claude-powered employee marketplace experiment Let's Data Science [truncated: Google News RSS provides only a snippet, not full article

Detailed Analysis

Anthropic conducted an internal experiment in December 2025 called Project Deal, deploying Claude AI agents to autonomously manage a classifieds-style marketplace among 69 employees at its San Francisco office. Each participant received a $100 budget in the form of gift cards, and Claude first interviewed volunteers to understand their preferences — including what items they wished to buy or sell, acceptable price ranges, and negotiation styles. Armed with those custom system prompts, Claude agents then operated independently within Slack, conducting multi-turn negotiations with no human intervention. The experiment produced 186 completed deals totaling over $4,000 in value, covering physical goods that were subsequently exchanged in person. Participants rated the outcomes as fair, and nearly half indicated they would be willing to pay for a similar AI-powered service in a commercial context — a notable signal of perceived utility.

A parallel model comparison embedded within the experiment added a layer of rigorous evaluation. Employees were unknowingly assigned either Claude Opus 4.5 or Haiku 4.5 as their negotiating agent, allowing Anthropic to measure performance differentials under realistic conditions. The results were unambiguous: Opus 4.5 agents outperformed Haiku 4.5 agents across every measured dimension, with sellers earning $2.68 more per item on average, buyers saving $2.45 per item, and Opus agents completing 2.07 more deals overall. The experiment also produced an anecdotal curiosity — one agent purchased 19 ping-pong balls for itself — underscoring the degree of autonomous decision-making the agents exercised within the parameters of their instructions.

Project Deal was not Anthropic's only foray into AI-managed economic activity. A separate experiment called Project Vend tasked Claude Sonnet 3.7 with operating an in-office vending shop, equipped with tools for web search and simulated email communication. That test revealed meaningful limitations: the model made suboptimal pricing decisions and did not reliably improve through iteration. Together, the two experiments form a complementary picture — one demonstrating Claude's comparative strengths in negotiation and transactional coordination, the other exposing gaps in dynamic economic reasoning and adaptive learning over time. Both efforts reflect Anthropic's sustained interest in understanding how AI systems perform in applied, real-world economic roles.

The broader significance of these experiments lies in their relevance to the rapidly expanding deployment of AI agents in commercial and labor contexts. Anthropic's own Economic Index has tracked that large language models cover approximately 75% of tasks associated with computer programming roles, but only around 33% of tasks across the broader computer and mathematical occupational category — highlighting persistent gaps where physical interaction or complex contextual reasoning is required. Project Deal's marketplace setting is notable precisely because negotiation combines linguistic fluency with strategic reasoning and value judgment, capabilities that sit near the frontier of what current models can credibly automate. The finding that model tier meaningfully affects negotiation outcomes has direct implications for enterprises considering AI deployment in procurement, sales, or trading functions, where small per-transaction advantages compound significantly at scale.

The experiments collectively position Anthropic as not merely a model developer but an active researcher into AI's economic behavior and labor market implications. By staging these trials internally before any commercial release, the company generates empirical data on agent performance in low-stakes environments, which can inform both product development and safety research. The willingness of nearly half of participants to pay for such a service suggests genuine market demand, and the performance gap between Opus and Haiku models raises important questions about cost-benefit tradeoffs for businesses deploying AI agents — particularly in domains where negotiation outcomes have direct financial consequences. As agentic AI systems become more capable and more widely deployed, internal experiments of this kind serve as early evidence about the conditions under which AI can act as a reliable economic participant.

Read original article →

Detailed Analysis

Don't Miss a Deploy