I got tired of not knowing what my coding agent would cost, so I built a free local tool that caps it

A solo developer created Runcap in response to coding agents repeatedly burning tokens while looping on errors without advance cost visibility. The tool estimates run costs before execution, enforces hard spending caps through a local gateway that returns 429 status when limits are crossed, compresses requests to reduce token usage, and provides rescue prompts when agents get stuck. Runcap is distributed as free open-source software under MIT license, runs entirely locally without cloud dependencies, and is installable via npm with only Node.js as a requirement.

Detailed Analysis

A solo developer frustrated with unpredictable token costs from AI coding agents — particularly Claude Code and Cursor — has released an open-source tool called Runcap, designed to give developers proactive cost control rather than retrospective reporting. The core problem the builder identifies is a well-known failure mode in agentic AI workflows: agents entering loops on unresolved errors, repeatedly rewriting plans and consuming tokens without producing useful progress, while the developer remains unaware of the accumulating cost until a billing statement or subscription limit surfaces the damage. Runcap addresses this by operating as a local proxy gateway that estimates cost ranges before a run begins and enforces a hard ceiling by returning HTTP 429 errors the moment spending crosses a defined threshold.

The technical implementation is deliberately lightweight. Built in pure Node.js with no Python or machine learning dependencies, Runcap runs entirely on the user's local machine and compresses request payloads — targeting JSON, logs, and stack traces rather than prose or code — before forwarding them to the underlying model APIs. The tool also includes a "rescue prompt" feature intended to help agents recover when they become stuck in repetitive failure cycles. The MIT license and npm distribution lower the barrier to adoption considerably, and the privacy-first, local-only architecture directly addresses the data sensitivity concerns many developers have about routing work through third-party cloud services.

The broader context here is significant. As Claude Code, Cursor, and similar agentic coding tools have matured into genuinely useful development environments, the cost unpredictability of long-running agent sessions has emerged as one of the most cited friction points among professional and hobbyist developers alike. Unlike single-turn completions, agentic loops can compound token usage exponentially when error recovery goes poorly, and existing tooling from API providers largely surfaces this information after the fact through dashboards and invoices. Runcap represents a category of developer tooling — local, cost-aware middleware — that the ecosystem has been slow to produce despite clear demand.

This development also reflects a widening gap between the capabilities Anthropic has shipped in Claude Code and the surrounding infrastructure developers need to deploy those capabilities confidently in production or high-iteration workflows. Third-party developers are beginning to fill that gap independently. The question the builder poses at the end of the post — what would make developers actually keep the tool running daily — signals awareness that the hardest challenge is not technical but behavioral: integrating a new tool into an already complex development workflow requires it to be nearly invisible. Whether Runcap achieves that depends heavily on how reliably its cost estimates track actual usage across different agent frameworks and model versions, a challenge that will only grow as model pricing and tokenization schemes continue to evolve.

Read original article →

Detailed Analysis

Don't Miss a Deploy