90%+ fewer tokens per session by reading a pre-compiled wiki instead of exploring files cold. Built from Karpathy's workflow.

Codesight, a free and open-source tool, reduces Claude's token consumption per session from 47,450 to 360 tokens by pre-compiling codebases into structured domain articles instead of requiring fresh file exploration. The tool uses the TypeScript compiler API and regex detection to extract routes, field types, and middleware directly from source code without relying on LLM reasoning. Every new session begins with full context immediately, eliminating the repeated token cost of codebase exploration.

Detailed Analysis

Codesight, an open-source developer tool released under the MIT license, addresses a fundamental inefficiency in AI-assisted coding workflows: the repeated, expensive token cost of having large language models like Claude explore a codebase from scratch at the start of every new session. Built by the developer behind the GitHub repository Houseofmvps/codesight and directly inspired by AI researcher Andrej Karpathy's "LLM Knowledge Bases" framework, the tool compiles a codebase into a structured wiki of domain-specific markdown articles — covering topics such as authentication, database schemas, and payment flows — that Claude can read at session start rather than ingesting raw source files. On a 40-file FastAPI project, this reduces context consumption from approximately 47,450 tokens to just 360 tokens, a reduction exceeding 99% in that specific benchmark. Across three real-world codebases tested by the author, the average reduction achieved 11.2x fewer tokens, with the most efficient usage pattern — serving only a 200-token index at session start and pulling targeted articles per query — yielding up to 91x token savings.

The technical implementation is notably disciplined in its avoidance of LLM inference during the compilation step itself. Codesight uses the TypeScript compiler API for TypeScript codebases and regex-based detection for Python, Go, Ruby, Rust, Java, Kotlin, Elixir, and PHP, completing the full wiki generation in approximately 200 milliseconds with zero external API calls. This design choice matters significantly: because the wiki is derived deterministically from the abstract syntax tree rather than from model reasoning, the data it surfaces — full route paths, field types, foreign keys, middleware chains — is structurally reliable. The tool applies a transparent epistemic marker, tagging regex-inferred routes as `[inferred]` so that Claude knows to verify those claims before acting on them, effectively encoding a confidence gradient directly into the context it receives.

The broader significance of Codesight lies in what it reveals about the emerging practice Karpathy has termed "context engineering" — the deliberate design of what information an LLM receives, in what form, and at what granularity, rather than simply prompting a model and allowing it to explore freely. Karpathy's public commentary on the subject noted that the space remained an opportunity for a serious product rather than a loose collection of scripts, and Codesight represents a direct response to that framing. The approach treats the context window not as a dump site for raw files but as a curated, structured input surface — a shift in philosophy that has practical consequences for both cost and performance, given that most LLM providers including Anthropic charge proportionally to tokens processed.

This development connects to a wider trend in 2025 and 2026 in which developers are increasingly managing Claude and other LLMs not through prompt phrasing alone but through upstream infrastructure decisions about data shape and retrieval. Tools such as retrieval-augmented generation pipelines, memory layers, and now codebase wiki compilers all reflect the same underlying insight: that LLMs perform better and more cheaply when given structured, pre-filtered context rather than being tasked with the retrieval and synthesis of information themselves. Codesight's approach is particularly lean in that it requires no vector database, no embedding model, and no runtime inference — it is essentially a static compilation step that makes the model's job easier before the session begins.

For Claude specifically, the implications extend beyond cost. Token efficiency directly affects how much substantive reasoning and code generation Claude can perform within a single context window, since tokens spent on codebase exploration displace tokens available for actual problem-solving. By front-loading structural knowledge into a compact, queryable wiki, Codesight allows Claude to begin contributing meaningfully from the first message of a session rather than spending early exchanges on orientation. As codebases grow and AI coding assistants become more central to professional software development workflows, tools that pre-engineer context in this fashion are likely to become standard infrastructure rather than optional optimizations.

Read original article →

Detailed Analysis

Don't Miss a Deploy