Detailed Analysis
A security breach involving an AI platform called Lily, used daily by approximately 70% of McKinsey's 40,000 consultants, exposed fundamental vulnerabilities in how enterprise organizations are building and deploying agentic AI systems. The exploit, disclosed by a startup called Codewall on March 9, 2026, required only $20 and no insider credentials for an autonomous agent to gain full read and write access to tens of millions of chat messages, tens of thousands of user accounts, and every system prompt governing the platform's reasoning behavior. The attack vector was SQL injection, a technique first documented in 1998 and covered in every introductory cybersecurity course. Twenty-two of the platform's 200 API endpoints shipped to production with no authentication whatsoever, including at least one endpoint that permitted production write access — meaning an attacker could have silently rewritten the AI's advisory behavior across one of the world's most influential consulting firms.
The article's central argument is that framing the Lily incident as a technical hygiene failure fundamentally misdiagnoses the problem. McKinsey employs highly capable engineers who understand how to authenticate an endpoint. The presence of 22 unauthenticated endpoints is not consistent with individual carelessness; it is consistent with a systemic organizational pattern in which the design and deployment of production software did not incorporate any serious consideration of agentic threat models. When Lily was first deployed two years ago, autonomous AI agents capable of traversing public endpoints to reach production data were not a realistic operational assumption. They are now. The root cause, as the author frames it, is not a forgotten checklist item but rather the absence of any architectural question about whether the API's shape was appropriate for a world in which AI agents would routinely interact with it.
This failure pattern points to a broader structural tension in enterprise AI procurement: business and executive teams operating under deadline pressure are frequently making purchasing and deployment decisions for AI platforms without sufficient technical participation to anticipate how agentic capabilities will interact with existing software infrastructure. The author explicitly expresses sympathy for McKinsey, arguing that the company's profile and the vividness of the consequences made it the public version of a failure shape that appears widely across enterprise AI programs in 2026. The procurement sequence most organizations use — evaluate product, sign contract, deploy — was designed for conventional SaaS, not for systems that expose APIs to increasingly capable autonomous agents that can probe, traverse, and exploit weakly bounded surfaces at scale and low cost.
The incident arrives at a moment when both Anthropic and OpenAI have been expanding their enterprise offerings and pushing agentic capabilities into production environments, making the underlying governance question increasingly urgent. The $20 price point of the attack is itself significant: as frontier model access becomes cheaper and agent frameworks become more capable, the cost of sophisticated automated exploitation drops toward near-zero. This changes the security calculus for any organization operating AI platforms with external-facing APIs. The traditional assumption that complexity or obscurity provides meaningful protection collapses when an agent can systematically probe hundreds of endpoints in minutes. What the Lily breach illustrates is that model capability and platform security are not separable concerns — a more capable AI platform with poorly governed infrastructure does not merely fail to deliver value; it actively creates asymmetric risk, converting organizational data and behavioral control surfaces into liabilities that can be exploited far more efficiently than they can be defended without deliberate architectural intervention.
Read original article →