I built a free local MCP server that cut my Claude Code PR review prompt from 63K to 8.7K tokens

A developer created graphify-ts, a free local MCP server that uses a code knowledge graph to optimize Claude Code interactions by reducing token usage and latency. The tool builds an index of code at startup using tree-sitter AST and community detection algorithms, allowing Claude to retrieve relevant code slices in a single call instead of multiple sequential tool invocations. Measured results on production codebases showed 2.6× fewer input tokens, 2.8× faster latency, and 7.25× smaller PR review prompts compared to the baseline approach.

Detailed Analysis

A developer has released graphify-ts, an open-source, locally-run MCP (Model Context Protocol) stdio server designed to dramatically reduce the token overhead Claude Code incurs when navigating large codebases. The tool works by pre-indexing a repository at build time using a combination of tree-sitter abstract syntax trees, Louvain community detection for clustering, BM25 lexical retrieval, and an optional ONNX-based reranker. The result is a persistent knowledge graph that Claude Code can query with a single `retrieve` call rather than the 8–10 sequential Glob, Grep, and Read tool calls the agent typically chains together when exploring unfamiliar code structure. The author published concrete, independently verifiable benchmarks: on a 1,268-file NestJS and Next.js production codebase, token consumption dropped from 615,190 to 233,508 (a 2.6× reduction) and latency fell from 96 seconds to 35 seconds. On a 36-file PR review, prompt tokens shrank from 63,024 to 8,690 — a 7.25× reduction — with equivalent review quality and hotspot detection on both runs.

The core insight driving graphify-ts is that Claude Code's default exploration behavior is stateless across sessions, meaning the agent rediscovers the same architectural structure repeatedly rather than referencing a precomputed index. This is not a flaw unique to Claude; all current LLM-based coding agents face the same structural problem because they operate via tool calls rather than holding persistent in-memory program representations. By moving structural analysis to index time and exposing a narrow retrieval interface, graphify-ts effectively separates the static analysis phase from the agentic query phase. The privacy-first design — the code never leaves the local machine — also addresses a significant practical concern for developers working in regulated industries or on proprietary codebases who may be reluctant to route source code through cloud-based analysis pipelines.

The project's published trade-offs reveal a nuanced understanding of where this approach succeeds and where it does not. The author explicitly flags a 13% token overhead penalty for cold-start single-question sessions due to the MCP tool-schema preamble, which only amortizes across multi-turn sessions. Language support is also tiered: JavaScript and TypeScript receive deep, framework-aware extraction passes for Express, NestJS, Next.js, Redux Toolkit, and React Router, while Python, Ruby, Go, Java, and Rust fall back to plain tree-sitter AST extraction, and a broader set of languages including C, Kotlin, and Swift use a generic structural extractor. The author also carefully distinguishes between a measured 1.5-million-token structural estimate for a multi-repo scenario versus an actual sent prompt, a level of methodological transparency that is notable in a space where benchmark claims are frequently inflated.

Graphify-ts sits within a rapidly expanding ecosystem of tools designed to make LLM coding agents more economically viable at scale. Token costs are not merely an inconvenience; at production usage volumes with models like Claude Opus 4.7, per-query overhead compounds significantly, and context-window constraints create hard limits on what questions can be asked at all — as illustrated by the multi-repo scenario where a naive prompt would have exceeded any available context window. The MCP protocol, which Anthropic introduced to standardize how external tools and data sources connect to Claude, has become fertile ground for this category of optimization tooling. Graphify-ts is one of the more architecturally sophisticated entries, combining information retrieval techniques from search engineering with graph-theoretic community detection, applied specifically to the problem of code structure understanding. The broader trend it reflects is a shift from treating LLMs as autonomous explorers of large information spaces toward a hybrid architecture where classical indexing and retrieval systems pre-filter the search space before the model ever sees a token.

Read original article →

Detailed Analysis

Don't Miss a Deploy