← Reddit

Spent a week building a CLI so my AI agent would stop spawning a fresh browser every time

Reddit · chocate · April 24, 2026
Ghax is a CLI tool that maintains a persistent Chrome DevTools Protocol session via a background daemon instead of launching a fresh browser for each command, significantly reducing cold-start overhead. Performance benchmarks show ghax executing commands in 49 milliseconds per call versus 58-680 milliseconds for competing tools, with text extraction running 9 times faster at 154 milliseconds compared to 1,404 milliseconds. The tool provides features including accessibility-tree snapshots, shadow-DOM traversal, extension internals inspection, and batch execution for AI agent browser automation.

Detailed Analysis

A developer identified by the GitHub handle "kepptic" has released an open-source tool called Ghax, a Rust-based CLI designed to solve a fundamental inefficiency in AI agent browser automation: the repeated cold-start cost of launching a fresh browser instance for every discrete command. By maintaining a persistent Chrome DevTools Protocol (CDP) session through a small background daemon — an approximately 80 KB Node.js bundle — Ghax pays the browser initialization cost only once per session rather than once per invocation. Benchmarks conducted on Apple Silicon against example.com show Ghax completing a full cold-start workflow in 1.56 seconds compared to 6.70 seconds for gstack-browse and 5.13 seconds for playwright-cli, while warm per-command latency drops to 49 milliseconds. On a real-world Wikipedia page of roughly 250 KB, Ghax delivers text extraction at 154 milliseconds versus Playwright-CLI's 1,404 milliseconds — a nearly 9× improvement — because it queries a DOM that is already parsed and resident rather than booting a new browser process to retrieve it.

Beyond raw performance, Ghax incorporates several architectural features that address common brittleness in agent-driven browser automation. Its accessibility-tree snapshot system assigns stable `@e<n>` references to elements identified by ARIA role and name rather than CSS selectors, which are notoriously fragile across UI updates. A dialog-aware walker ensures that when a modal is active, the snapshot traverses the modal's subtree rather than the aria-hidden background content, preventing the agent from interacting with obscured elements. Shadow DOM traversal via chain selectors (`host >> inner`) extends compatibility to component-based applications built on frameworks like Lit and Shoelace. Additional capabilities include MV3 extension internals access, HAR-format network capture with source-map resolution that translates minified bundle coordinates back to original TypeScript file locations, Core Web Vitals measurement, and batch execution with automatic re-snapshotting between steps to handle dynamic UI changes such as combobox reshuffles that would otherwise invalidate stale element references.

The broader context for this tool sits at the intersection of AI agent tooling and developer frustration with existing automation primitives. General-purpose browser automation libraries like Playwright and Puppeteer were designed primarily for deterministic test scripts, not for the interactive, iterative, and often unpredictable command patterns generated by language model agents. Each tool call from an LLM-based agent can represent a discrete decision point, and the latency of spawning a new browser process — compounded across dozens of agent steps in a single task — accumulates into meaningful user-facing delays and increased API costs from slower feedback loops. Ghax's daemon-session model is a direct response to this mismatch, treating the browser as a persistent service rather than a one-shot utility.

This release reflects a rapidly maturing ecosystem of developer-built infrastructure around agentic AI workflows, particularly for Claude and similar frontier models. While Anthropic's own Claude Code CLI has introduced sandboxing features, permission classifiers, and sub-agent configuration controls to manage agent behavior and reduce unauthorized spawning, those mechanisms address security and permission scoping rather than the raw performance profile of browser sessions. Ghax occupies a complementary space: it does not compete with permission management but instead optimizes the transport layer between an agent's reasoning loop and the browser environment. The fact that a developer invested a week building this as a side project underscores how significant the cold-start tax has become as agent workflows grow more complex and browser interaction becomes a routine primitive rather than a specialized capability.

The project's technical choices are themselves noteworthy signals about the direction of AI tooling. The decision to write the CLI in Rust — yielding a ~3 MB stripped binary with a ~20 ms cold start for single-command invocations — reflects a prioritization of predictable, low-overhead performance that interpreted language runtimes cannot reliably guarantee. The separation of concerns between the Rust CLI (thin, fast, stateless) and the Node daemon (stateful, CDP-aware) mirrors an architectural pattern increasingly common in developer tools that must bridge the systems-programming world with the Node/browser ecosystem. As AI agents are asked to perform more sustained, multi-step web interactions — filling forms, navigating SPAs, inspecting extension internals — purpose-built tooling like Ghax is likely to proliferate, eventually pressuring mainstream automation frameworks to adopt persistent-session models as a first-class feature rather than an afterthought.

Read original article →