How to optimise token usage when working with a large document project (Legal Case)?

A Claude user working on complex legal case analysis through the Projects feature reported excessive token consumption during document drafting and research. The user's workflow involved processing a 15-page primary document along with frequently uploaded supporting materials, causing token usage to spike with each prompt because Claude processes the entire chat history and all attached documents with every new request. The user sought advice on managing chat context efficiency, organizing supporting materials, and optimizing prompts to limit processing to specific document sections rather than full documents.

Detailed Analysis

A Reddit user working on a complex legal case with Claude's Projects feature has surfaced a practical and increasingly common challenge: the rapid depletion of token limits during long, document-heavy working sessions. The user's setup involves uploading primary case documents to a Claude Project, actively drafting and analyzing a 15-page document within chat sessions, and attaching additional supporting materials—evidence, references, and citations—as the work progresses. The core technical problem is that Claude's architecture passes the entire accumulated chat history and all attached documents into the context window with each new prompt, meaning that a session grows exponentially more expensive in tokens as the conversation lengthens and attachments accumulate. The user is soliciting workflow strategies from the community around three specific pain points: managing active chat context, handling supporting materials, and prompting techniques that constrain Claude's attention to specific document sections rather than the entire corpus.

The situation the user describes reflects a genuine architectural tension in how large language model-based tools handle persistent context. Claude Projects are designed to maintain a knowledge base across sessions, but the distinction between Project-level knowledge and in-session attachments carries significant implications for token consumption. Documents stored in the Project Knowledge base are, in principle, available to Claude without being re-injected into every conversational turn in the same way that in-chat attachments are—though the precise mechanics differ depending on how Anthropic has implemented retrieval within Projects. The user's instinct to migrate supporting documents out of active chat and into the Project Knowledge base is directionally sound, as in-chat attachments contribute directly to the rolling context window in a way that inflates token counts with each subsequent message. Starting fresh chat threads for discrete tasks or document sections is a standard mitigation strategy, effectively resetting the context overhead at the cost of losing conversational continuity.

From a legal workflow perspective, the challenge is particularly acute because legal document analysis is inherently iterative and cross-referential. Attorneys and legal professionals must frequently triangulate between primary case documents, evidentiary exhibits, precedent citations, and statutory references—precisely the kind of multi-document, multi-turn engagement that strains context windows. Prompting techniques that anchor Claude's attention to specific passages—such as explicitly quoting the relevant paragraph within the prompt rather than relying on Claude to locate it within a large attached document—can meaningfully reduce the cognitive load placed on the model and improve response precision. Similarly, structured prompting that instructs Claude to respond only to a delimited section, combined with concise conversational turns rather than open-ended exploratory questions, helps prevent runaway context accumulation.

The post connects to a broader challenge facing professional users of frontier AI tools: the gap between the theoretical capacity of large context windows and the practical economics of using them in sustained, production-grade workflows. Anthropic has significantly expanded Claude's context window in successive model generations, yet longer context does not eliminate the cost and efficiency considerations that emerge in real-world use. Token limits in subscription tiers create a ceiling that many power users encounter, particularly in domains like law, medicine, and research where document density is high and analytical depth is required. This dynamic is pushing sophisticated users toward workflow engineering—essentially designing document management and prompting strategies as a discipline in their own right—rather than simply relying on the model's raw capability to absorb and process everything simultaneously.

The discussion also highlights a product design challenge for Anthropic as it develops the Projects feature for professional use cases. The user's uncertainty about whether documents in Project Knowledge are "forgotten" or deprioritized relative to in-chat attachments suggests that the interface could benefit from greater transparency about how different storage locations affect context injection and retrieval behavior. As Claude is increasingly positioned for enterprise and professional applications, clearer documentation and potentially finer-grained user controls over context management—such as the ability to pin or exclude specific documents from active context—would address a real and recurring friction point for the segment of users undertaking the most demanding analytical work.

Read original article →

Detailed Analysis

Don't Miss a Deploy