← Reddit

engram now shows you the tokens it saved in a live web dashboard. Just hit 88.1% session savings on average.

Reddit · SearchFlashy9801 · April 19, 2026
Engram v2.0 released a web dashboard that displays token savings metrics from Claude Code sessions, showing real usage data such as 86,000 tokens saved and an 88.4% average reduction rate. The tool intercepts file operations in Claude Code and serves compact context packets from eight parallel providers instead of full files. A security vulnerability in v2.0.0 involving CORS and disabled authentication was discovered and patched in v2.0.2.

Detailed Analysis

Engram, an open-source persistent memory and context-compression tool built for Claude Code, reached version 2.0 with the addition of a live web dashboard that visualizes token savings and knowledge graph activity in real time. The tool operates by intercepting Claude Code's file operations — specifically Read, Edit, and Write hooks — and substituting compact context packets for full file contents. Those packets are assembled from eight parallel providers covering code structure, known bugs, git history, semantic memory, external documentation via Context7, Obsidian notes, reusable skills, and logged mistakes. On the developer's own week of usage, engram reported 86,000 tokens saved, an 88.4% average context reduction, and 485 cache hits across 1,005 knowledge-graph edges — translating to roughly $0.26 in API cost savings at standard Claude pricing. The dashboard, launched with a single `engram ui` command, presents six tabs including a live Activity feed streamed over Server-Sent Events and a Canvas 2D force-directed graph rendering the agent's accumulated knowledge.

The release was immediately followed by a security hotfix in v2.0.2 addressing a significant vulnerability disclosed as GHSA-2r2p-4cgf-hv7h. Version 2.0.0 shipped with a CORS wildcard combined with authentication disabled by default, a combination that would have allowed any browser tab on a user's machine to read the local knowledge graph or inject malicious data into the mistake-memory store — effectively poisoning the agent's learned behaviors. The fix applied four layered defenses: fail-closed authentication, removal of the CORS wildcard, Host and Origin header validation, and strict Content-Type enforcement. The rapid turnaround from disclosure to patch, credited to community reporter u/gabiudrescu, reflects a standard responsible-disclosure cycle, but the nature of the vulnerability — local graph poisoning — highlights a non-obvious attack surface that emerges specifically when AI tools maintain persistent, writable local state.

The broader significance of engram's approach lies in its position within a growing ecosystem of context-efficiency tooling built around large language model APIs. As token costs remain a practical constraint for developers running extended agentic sessions, third-party memory and compression layers have emerged as a category of their own, sitting between the user's codebase and the model's context window. Engram's architecture — local-only, zero native dependencies, Apache 2.0 licensed, with 670 tests — positions it as infrastructure rather than a hosted service, giving developers full control over what the agent remembers and how that memory influences future sessions. The 88% reduction figure, if reproducible across diverse codebases, represents a meaningful shift in the economics of running Claude Code on large repositories where file reads are frequent.

The addition of a real-time dashboard is notable not only as a usability feature but as a transparency mechanism. By streaming hook decisions live and rendering the knowledge graph visually, engram makes legible a process — context selection and compression — that would otherwise be entirely opaque to the developer. This aligns with a broader trend in AI tooling toward interpretability at the infrastructure layer, where developers increasingly want to audit not just what a model outputs but what information it was given and why. As agentic coding workflows mature, tools that expose the memory and context layer rather than abstracting it away entirely are likely to become a standard expectation rather than a differentiating feature.

Read original article →