Detailed Analysis
A developer using Claude Code has surfaced a structural inefficiency in the platform's skill architecture that generates significant, invisible token costs for users with large skill libraries. By instrumenting their own environment, the author discovered that all 2,596 of their installed skills — sourced from three separate marketplaces — were being loaded wholesale into the system prompt at the start of every session, consuming 102,651 tokens per session regardless of whether any given skill was ever invoked. With only 40 skills ever actually called upon, the effective waste rate stood at 98.6%, translating to approximately $91 per month in token charges that appeared on no dedicated billing line and triggered no user-facing warning.
The technical cause is architectural rather than incidental. Claude Code's harness must enumerate all available skills within the system prompt so the model can "see" and reason about them — a design pattern common across tool-augmented language model systems. The problem is one of scale: as skill marketplaces proliferate and users accumulate integrations across multiple sources, the per-session overhead compounds rapidly. The author notes that 96% of their cost traces to a single marketplace, suggesting that individual ecosystem actors can impose disproportionate token burdens on users who subscribe to their catalogs without necessarily curating their installs.
The response was the release of an open-source auditing tool, `skill-tax`, available on GitHub, which offers three commands — `audit`, `usage`, and `report` — to quantify per-skill token cost, surface actual invocation history mined from local transcripts, and rank cleanup candidates by cost-to-usage ratio. Critically, the tool makes no deletions and takes no automated action; it surfaces data and leaves decisions to the user. This reflects a design philosophy oriented toward transparency rather than intervention, appropriate given that users may have legitimate reasons to retain rarely-invoked skills.
The broader significance of this finding sits at the intersection of AI cost accountability and the emerging economics of tool-augmented models. As AI systems move toward richer, more extensible ecosystems — with plugin stores, MCP servers, and skill marketplaces becoming standard infrastructure — the token overhead of capability enumeration becomes a real and recurring line item in operating costs. Unlike compute costs for inference on user queries, this overhead is structural: it accrues per session, not per task, and scales with library size rather than with actual utilization. The phrase the author coins — "context is the new RAM" — captures the analogy precisely: just as bloated processes consumed finite memory in operating systems, over-installed skill sets now consume finite context windows, degrading both cost efficiency and potentially model performance as the prompt fills with irrelevant capability declarations.
This episode also highlights a gap in how AI platform vendors currently surface cost attribution. Traditional software billing is generally tied to active resource consumption; token-based billing on context loading represents a subtler model where passive presence generates charges. Until platforms build native tooling to surface per-integration token costs, third-party auditing tools like `skill-tax` will fill that transparency void — but their existence as a workaround rather than a built-in feature points to an emerging design responsibility for AI platform developers as these ecosystems mature.
Read original article →