Built a Chrome extension for the long-session degradation problem — want this sub's read on whether it's actually useful

A developer created Curlo, a Chrome extension designed to address context degradation that occurs during extended Claude conversations when the model loses coherence despite technically retaining information. The tool provides a context window fill indicator and one-tap checkpoints that save structured conversation summaries locally, allowing users to transition to fresh chats while maintaining context continuity. The extension operates entirely client-side without requiring accounts or backend infrastructure.

Detailed Analysis

A Reddit user identifying as a long-time Claude user has developed and publicly shared a Chrome extension called Curlo, designed to address what they term the "long-session degradation problem" — a behavioral pattern in which Claude's adherence to user-defined constraints and conversational context weakens significantly as conversation length increases. The developer describes a reproducible failure mode: instructions set dozens of messages earlier lose their hold on the model's outputs, and even restating them produces only temporary compliance before the drift resumes. Critically, the author distinguishes this from a raw context window limitation, arguing that degradation occurs well before the technical ceiling of 200K or even 1M tokens, suggesting the issue is one of attention weighting rather than memory capacity. Curlo's proposed solution includes a visual ring indicator displaying window fill percentage, a one-tap "checkpoint" system that fires a structured prompt to extract decisions, progress, open questions, and next steps into a locally saved summary, and a delta-based checkpoint design to keep summaries compact. The tool is fully client-side with no backend infrastructure, and the developer is planning optional Notion sync and an on-device AI-assisted prompt assembly feature.

The problem Curlo targets is meaningfully distinct from the cross-session memory tools and context indicator utilities the author surveys. Services like Mem0 and MemoryPlugin address persistence of user identity and preferences across separate conversations, while token-fill indicators like Context Compass and TokenFlow provide diagnostic visibility but leave the remediation work entirely to the user. Claude's own server-side auto-summarization, meanwhile, operates opaquely and cannot be triggered on demand or inspected for fidelity. The gap Curlo is trying to close — the workflow between noticing degradation and successfully resuming in a new session without material context loss — is a genuine and underserved one. The structured checkpoint approach is particularly notable because it externalizes the summarization task back to Claude itself under controlled prompting conditions, rather than relying on either the user to manually reconstruct context or the platform's automatic compression to make the right editorial choices.

The broader phenomenon the post describes reflects a known and actively discussed limitation in large language model behavior at scale. Transformer-based architectures process context through attention mechanisms that, in practice, do not weight all tokens equally regardless of their position or semantic importance; instructions and constraints introduced early in a long context can be progressively outcompeted by more recent tokens, even when technically "visible" to the model. Anthropic's Claude models, despite their industry-leading context windows, are not immune to this dynamic. The developer's observation that "usable performance drops well before the limit" aligns with findings across the research community regarding what is sometimes called "lost in the middle" degradation, where retrieval and instruction-following accuracy degrades for content positioned far from the beginning or end of a long context window.

From a product landscape perspective, the extension situates itself at an interesting inflection point. Anthropic's Projects feature, which the author directly asks about, offers shared system-level context that persists across sessions — a partial architectural mitigation for the problem — but does not solve intra-session drift for long, continuous conversations. The author's uncertainty about whether Projects meaningfully delays hitting the degradation wall mid-conversation points to a real information gap: Anthropic has not published granular guidance on the behavioral tradeoffs of Projects versus standard sessions at high message counts. This creates space for third-party tooling to fill both the workflow and the knowledge gap simultaneously, which is precisely what Curlo attempts to do.

The post itself functions as both a product launch and a community research inquiry, asking Claude power users to validate whether the problem is real at scale, share system prompt strategies that delay drift, and benchmark Projects' effectiveness. This crowd-sourced validation approach is increasingly common in the Claude and broader LLM enthusiast community, where heavy users have developed sophisticated folk knowledge about model behavior that often outpaces official documentation. The fact that the developer built a full tool before fully confirming community demand — and is explicitly inviting critique — reflects the rapid, low-friction development cycle enabled by platforms like Lovable, whose URL appears in the extension's landing page link, suggesting the frontend was itself AI-assisted. Whether Curlo finds sustained adoption will likely depend on how Anthropic's own context management features evolve, but the problem it identifies is substantive and the workflow gap it targets remains genuinely unaddressed by current native tooling.

Read original article →

Detailed Analysis

Don't Miss a Deploy