Claude Code keeps looping on my fixes

A Claude Code user encountered repeated suggestions of previously undone fixes, causing token ceiling issues and hallucinated edits that resulted in $1,400 in surprise bills. A context layer solution was developed to address this problem by tracking only modified files, reducing token usage by 89.1% on a test repository, and is available via npx engramx@4.0.0 under Apache 2.0 licensing with local SQLite backing.

Detailed Analysis

A developer working with Claude Code documented a recurring failure mode in long-running AI-assisted coding sessions: the assistant repeatedly re-suggested fixes the user had already undone, cycling through the same incorrect recommendations three times consecutively. As the session approached its token ceiling, the behavior degraded further into hallucinating earlier edits — fabricating a history of changes that had not actually been made. The practical consequence was a $1,400 unexpected bill incurred while the user chased a non-existent problem, illustrating how context window exhaustion in agentic coding workflows can produce both technical failure and significant financial cost.

The developer measured the underlying issue against an 87-file repository, finding that naive context loading produced a raw token count of 163,122. By introducing a dedicated context management layer called engramx (installed via `npx engramx@4.0.0`), that count fell to 17,722 — an 89.1% reduction. The tool achieves this by selectively re-reading only the files the assistant actually modified rather than pulling the entire codebase into context. The author reports a 6.4x token reduction compared to loading only relevant files, and up to a 155x reduction compared to full-codebase ingestion in the most favorable scenarios. The system installs six "Sentinel hooks" by default, hooking into Claude Code's PreToolUse lifecycle events for Edit, Write, and Bash operations, and maintains a bi-temporal index with automatic capture of revert commits backed by a local SQLite database.

The problem this tool addresses reflects a structural challenge in all long-context AI coding agents, not merely Claude Code. When a session accumulates edits, reverts, and iterative changes across dozens of files, the assistant's effective working memory becomes polluted with stale or contradictory information. The model cannot distinguish what the current state of the codebase actually is versus what it was at various prior points in the session. This produces the looping behavior the author observed: the model confidently re-proposes solutions it cannot "remember" having already proposed and seen rejected. The bi-temporal indexing approach — tracking both when a change was made and when it was registered — is a meaningful architectural response to this, as it gives the agent a reliable ground truth about file states without relying on the context window itself as the source of record.

More broadly, this post is representative of an emerging category of community-built tooling designed to patch gaps in first-party AI coding assistant infrastructure. Claude Code, GitHub Copilot Workspace, and similar agentic tools have all encountered friction around context window management in large or complex repositories. The fact that a third-party Apache 2.0 tool running entirely locally can produce a 6-9x token reduction suggests that official implementations may not yet be optimizing aggressively for selective context injection. The financial dimension — surprise billing from runaway token consumption — adds urgency to the problem and underscores that as AI coding tools move from novelty to professional infrastructure, robustness guarantees around cost and session stability will become as important as raw capability.

Read original article →

Detailed Analysis

Don't Miss a Deploy