Detailed Analysis
Anthropic's internal experiment, dubbed Project Deal, placed Claude AI agents in a real-money marketplace in December 2025, giving each of 69 employees a $100 budget and tasking the AI with autonomously completing transactions on their behalf. After a roughly ten-minute intake interview in which participants described their buying and selling preferences, Claude took full control — listing items, fielding inquiries, issuing counteroffers, and finalizing purchases — entirely without human intervention through a Slack-based platform. By the experiment's conclusion, the agents had negotiated 186 deals across more than 500 listings, generating over $4,000 in total transactions. The goods exchanged ranged from snowboards and Halloween decorations to a bag of 19 ping-pong balls, which one Claude agent memorably described to a potential buyer as "perfectly spherical orbs of possibility."
The experiment doubled as a rigorous model benchmarking exercise. Anthropic ran different versions of Claude in parallel — including the flagship Claude Opus 4.5 and lighter-weight variants like Haiku 4.5 — without informing participants which model was managing their account. The performance gap between models proved financially meaningful: stronger models completed approximately two additional deals per user, secured $2.68 more per item sold, and paid $2.45 less per item purchased. Despite these quantifiable differences in outcomes, participants assigned to weaker models reported no awareness of the disadvantage, a finding that raises substantive questions about transparency and informed consent in autonomous AI deployments. The experiment also found that negotiation style — aggressive versus collaborative — had minimal bearing on results, with seller outcomes correlating more strongly with initial listing prices than with tactical maneuvering.
User sentiment toward the experiment was cautiously positive. Participants rated fairness at roughly the midpoint of a seven-point scale, most indicated they would participate again, and 46 percent said they would pay for a comparable AI-mediated marketplace service. These figures suggest a meaningful consumer appetite for autonomous AI agents handling transactional tasks, even when participants lacked full visibility into how those agents were operating. The data points to a nuanced divide: users trusted the process enough to repeat it, yet fairness perceptions fell short of strongly positive — a gap likely connected to the opaque model-assignment structure and the unequal financial outcomes it produced.
Project Deal is significant in the broader landscape of agentic AI development because it moves beyond synthetic benchmarks into messy, real-world economic behavior. Unlike controlled laboratory evaluations, the experiment involved genuine financial stakes, interpersonal goods with sentimental value, and the kind of contextual reasoning — understanding that a seller values speed over maximum price, for instance — that rule-based systems cannot replicate. The results validate a core premise of multi-agent and agentic AI research: that large language models can navigate complex, open-ended tasks with minimal human supervision when given sufficient context and appropriate tools.
The experiment also surfaces a tension that will become increasingly central to the deployment of autonomous AI agents at scale. When AI systems handle consequential decisions on behalf of users — financial negotiations, scheduling, procurement — and those users receive meaningfully different outcomes based on which model version they were assigned, the ethical and regulatory implications are substantial. Anthropic's willingness to publish these findings, including the uncomfortable data on model-driven inequality, positions Project Deal as both a proof of concept and a candid admission that capability asymmetries in AI deployments can produce real-world disparities that users may be entirely unaware of. As agentic AI moves from internal experiments to commercial products, establishing norms around disclosure, model parity, and outcome accountability will be among the defining challenges for the industry.
Read original article →