Anthropic gave Claude a $100 budget and told it to go shopping: Here's what it bought - The Times of India

Anthropic gave Claude a $100 budget and told it to go shopping: Here's what it bought The Times of India [truncated: Google News RSS provides only a snippet, not full article

Detailed Analysis

Anthropic's "Project Deal" experiment, conducted in December 2025, placed Claude AI agents in the role of autonomous economic actors, giving each of 69 employees a $100 budget to participate in a company-run classifieds marketplace entirely through Slack. The agents conducted 10-minute intake interviews with participants to understand their preferences, then operated with complete autonomy — posting listings, fielding offers, issuing counteroffers, and closing transactions without any human approvals or mid-process check-ins. The experiment generated 186 completed deals, more than 500 listed items, and over $4,000 in total transaction value, with goods exchanged ranging from snowboards to a plastic bag containing 19 ping-pong balls. Physical exchanges of items took place at the conclusion of the week-long trial.

The experiment's most striking finding concerned the performance differential between Claude models. Anthropic ran four parallel marketplaces to benchmark Claude Opus 4.5 against lighter models such as Haiku, and the results were pronounced. Opus 4.5 secured prices approximately 70% higher for sellers — exemplified by the same broken folding bike fetching $65 in its marketplace versus $38 in a competing one — while simultaneously paying buyers an average of $2.45 less for equivalent items. This dual advantage in negotiation, benefiting both sides of the market depending on the agent's assigned role, suggests that frontier model capability translates directly and measurably into economic performance in real-world transactional contexts.

Participant reception offered a nuanced signal for the commercial potential of such systems. Fairness ratings clustered around the midpoint of a 1-7 scale, indicating that participants did not feel exploited by their AI negotiating counterparts, even when those agents were optimizing against them. A majority expressed willingness to repeat the experiment, and nearly half indicated openness to paying for such a service in a commercial context. These responses suggest that human comfort with AI-mediated economic agency may be maturing faster than previously anticipated, particularly when the AI's decisions are transparent and the human retains final physical control over the exchange.

Project Deal carries broader significance as a proof-of-concept for agentic AI in commerce. The traditional framing of AI assistants as tools that execute human instructions gives way here to a model in which AI agents represent human interests autonomously over extended timeframes, managing complex multi-party interactions with incomplete information. The negotiation outcomes exceeded Anthropic's own expectations, signaling that the gap between controlled AI capability demonstrations and genuine real-world commercial deployment is narrowing. This aligns with a wider industry trend in which AI labs are shifting focus from raw benchmark performance toward evaluations of agentic behavior, economic reasoning, and sustained task completion in open-ended environments.

The experiment also sits within a larger context of substantial infrastructure investment in Anthropic's ecosystem. Amazon's separate commitment of an additional $5 billion as part of a $100 billion cloud deal underscores the scale of resources being marshaled to support Claude's development and deployment capacity. As agentic applications like Project Deal move from internal experiments toward potential productization, the computational and infrastructure demands of running frontier models like Opus 4.5 autonomously across large user populations become a critical enabler. The convergence of demonstrated agentic capability with deepening capital investment positions Anthropic as a central player in the emerging landscape of AI-driven autonomous economic systems.

Read original article →

Detailed Analysis

Don't Miss a Deploy