Launch HN: Voker (YC S24) – Analytics for AI Agents

Voker is an agent analytics platform that provides AI product teams visibility into user interactions with their conversational agents through a lightweight SDK compatible with OpenAI, Anthropic, and Gemini. The platform analyzes conversations using three core primitives—Intents, Corrections, and Resolutions—to automatically categorize usage patterns and provide actionable insights without requiring engineers to manually debug logs or rely on unvalidated LLM summaries. It addresses a widespread problem where AI teams currently detect agent failures only through customer complaints, offering a free tier with 2,000 monthly events and paid plans starting at $80 per month.

Detailed Analysis

Voker, a Y Combinator S24 company founded by Alex and Tyler, is entering the AI infrastructure market with an agent analytics platform designed to give product teams structured visibility into how deployed AI agents perform in production. The company's core offering is a lightweight, LLM-stack-agnostic SDK that wraps calls to major AI providers — OpenAI, Anthropic, and Google Gemini — and processes the resulting interaction data into what Voker terms "analytics primitives": Intents, Corrections, and Resolutions. These three categories attempt to formalize the lifecycle of any conversational AI interaction, capturing what a user wanted, where the agent fell short and required user correction, and whether the agent ultimately delivered. The platform is available in both Python and TypeScript, with a free tier of 2,000 events per month and paid plans beginning at $80 per month.

The problem Voker is addressing reflects a critical maturity gap in enterprise AI deployment. A survey of YC founders conducted by the company found that more than 90% learned of agent failures solely through direct customer complaints — a reactive, anecdotal signal that arrives far too late to prevent churn or systematically improve agent configurations. Existing solutions occupy adjacent but insufficient niches: observability tools provide granular trace-level debugging but are accessible primarily to engineers and lack product-level interpretation; evaluation frameworks test for known failure modes but cannot surface unexpected patterns emerging in live usage; and traditional product analytics platforms built around click and pageview events are architecturally unsuited to handle the unstructured, conversational nature of agent interactions. Voker's value proposition is to occupy the space between these categories, offering insights at a layer of abstraction that both engineers and product managers can act on.

A particularly significant technical design decision in Voker's architecture is the explicit separation of LLM usage from core data engineering work. The founders note that the most common informal substitute for dedicated agent analytics — uploading observability logs to frontier models like Claude or ChatGPT and requesting summary insights — is unreliable because LLMs do not perform rigorous statistical analysis or systematically classify every individual session. Voker instead uses LLMs only for annotation and intent extraction, while employing deterministic data engineering pipelines and hierarchical text classification for statistics and trend aggregation. This ensures reproducibility and accuracy, two properties essential for any analytics product being used to make product decisions.

Voker's launch situates it at the intersection of two converging forces in the AI industry: the rapid proliferation of agent-based products across enterprise software, and the growing recognition that deploying an LLM is only the beginning of the engineering challenge. As AI agents move from prototype to production, the operational infrastructure surrounding them — monitoring, evaluation, observability, and now analytics — has become a distinct and increasingly competitive product category. The agent analytics framing Voker proposes, centered on intent resolution rather than latency or token usage, represents a shift toward user-outcome-oriented measurement that aligns more closely with product development workflows than with traditional MLOps tooling.

The broader significance of Voker's approach lies in its implicit argument that conversational AI interactions require a new analytical vocabulary. Existing frameworks borrowed from web analytics, APM, or ML evaluation were not designed to answer questions like "what percentage of users who wanted to accomplish X were ultimately successful?" at scale and without manual review. By proposing structured primitives that generalize across agent types and LLM providers, Voker is attempting to standardize how the industry measures agent quality — a prerequisite for the kind of systematic, data-driven iteration that has historically driven improvement in other software product categories. Whether this framing gains traction will depend heavily on whether product teams find these primitives meaningful across diverse agent use cases, and whether Voker can demonstrate that automated intent classification remains accurate enough at scale to be trusted for product decisions.

Read original article →

Detailed Analysis

Don't Miss a Deploy