The One AI Writing Hack Nobody Talks About.

Sullivan and Cromwell's motion filing with AI-generated fabricated citations revealed that hallucinations stem from disorganized working environments rather than model limitations. Modern AI agents now effectively manipulate file systems, enabling a workflow where the initial step involves organizing source materials into a structured data room before generating deliverables. This structural approach prevents hallucinations at scale by ensuring the AI understands and inventories source materials before synthesis.

Detailed Analysis

Sullivan & Cromwell's filing of an AI-generated legal brief containing dozens of fabricated or miscited citations in a federal bankruptcy proceeding has become a focal point for a broader argument about how organizations are failing to design safe agentic workflows. The incident, which required a partner and co-head of the firm's restructuring practice to submit an apology to the court, illustrates what the article's author characterizes as a new category of failure: not the naive hallucinations of an individual user mishandling an early-generation chatbot, but structural breakdowns embedded in professional AI pipelines at firms with access to the most sophisticated tooling available. The citations were properly formatted, the motion structurally coherent, and yet the underlying factual content was wrong — and no member of the legal team caught the errors before filing. The author uses this as evidence that prompt-level interventions, such as instructing a model to avoid hallucinating, are categorically insufficient, arguing that language models have no internal truth-verification mechanism that such instructions can activate.

The article's central claim is that a meaningful reduction in hallucination risk requires a fundamentally different workflow architecture, one that became newly viable with the release of Claude Opus 4.7 and OpenAI's GPT model o3/o4 generation (referred to colloquially as "5.5"). These models, the author argues, have crossed a threshold in their ability to perform long-running agentic tasks operating directly on a file system — walking folder trees, opening documents, comparing metadata, and cross-referencing dates across files. This is framed not as a glamorous capability but as a foundational primitive that changes the sequencing of how serious AI-assisted work should begin. Rather than immediately prompting a model to produce a document or analysis, the author advocates for a preparatory phase in which the agent is tasked with constructing a structured "data room" — an organized file environment containing all relevant source materials — before any drafting or synthesis work begins. The insight is that most professional projects start with disorganized, contradictory, and partially stale source material, and that asking an AI to write from that chaos forces it to simultaneously organize and produce, which amplifies the conditions under which hallucination occurs.

The workflow the author describes separates two jobs that have historically been collapsed into a single prompt: the organizational job and the generative job. By offloading the organizational phase to an agent capable of file-system traversal, the human operator can verify the factual scaffolding before any substantive drafting occurs. The author reports personal experience using this approach in Codex to simultaneously draft up to eight documents in parallel, attributing the speed and accuracy of that output directly to the preparatory data room phase. The argument is not that this approach eliminates hallucination — the author is explicit that it does not — but that it creates a structural environment that makes hallucinations significantly less likely and, critically, more detectable before outputs are finalized and submitted to consequential audiences like federal judges.

This framing connects to a broader evolution in how the AI industry and its professional users are reconceptualizing the role of language models in high-stakes workflows. The 2022–2024 era of AI adoption was largely characterized by direct task delegation — asking a model to write, code, or analyze in a single prompt — which placed the burden of accuracy entirely on the model's internal generation process. The emergence of agentic capabilities, particularly those tied to persistent file systems and multi-step reasoning over structured data, shifts the locus of quality control earlier in the workflow and makes it amenable to human review at discrete checkpoints. The Sullivan & Cromwell case functions in the article as a warning about the costs of applying the older, prompt-centric paradigm to workflows that have outgrown it in scale and consequence. As law firms, financial institutions, and other regulated industries deepen their AI integration, the distinction between model capability and workflow design is likely to become increasingly central to both professional practice standards and legal liability frameworks.

Read original article →

Detailed Analysis

Don't Miss a Deploy