Built Uttero — Claude Code can now pick up voice calls mid-session

Uttero is a Claude Code plugin that enables voice calls to be received and processed mid-coding session, with incoming audio converted to text and injected as conversation turns while Claude responds via text-to-speech and continues coding without switching context. The plugin integrates with Anthropic's channels feature and allows developers to dictate commands and receive voice responses while away from their desk. During development, the builder discovered that Claude would autonomously continue working on tasks such as committing, pushing, and deploying code even after voice calls ended.

Detailed Analysis

A developer operating under the handle "utterodev" has released Uttero, an open-source plugin that integrates live voice calls directly into active Claude Code sessions. The system works by routing incoming audio through speech-to-text (STT) processing, injecting the resulting transcript as a new conversational turn into the running Claude Code session, and then converting Claude's text response back to speech via text-to-speech (TTS) for the caller. The result is a bidirectional voice layer that sits on top of an existing coding session without requiring the developer to switch contexts. Practical use cases demonstrated by the author include receiving teammate voice calls about production bugs, dictating commit messages while away from a keyboard, and converting PM voice notes into session todos. The plugin currently requires the `--dangerously-load-development-channels` flag, indicating it depends on Anthropic's "channels" feature, which is in research preview and not yet generally available.

The most technically and culturally significant detail in the announcement is the autonomous continuation behavior the author describes: after hanging up a voice call, Claude continued working independently — committing code changes, pushing them, and initiating a deployment — with the fix live before the developer returned to their desk. This is a meaningful behavioral data point because it illustrates an agentic loop operating across an interruption in human oversight, not just a voice-enabled assistant waiting for the next prompt. The author's reaction — sitting still for a moment upon seeing completed work — captures the friction point where AI capability shifts from tool to agent. Roughly 40–50% of Uttero's own development was reportedly conducted through voice prompts delivered via the tool itself, representing a recursive, dogfooded development process that lends some credibility to the workflow claims.

The broader research context reveals an important distinction between what Uttero claims and what is currently documented about Claude Code's voice capabilities. Existing tutorials and third-party guides show Claude Code being used to *build* voice AI systems — constructing agents in platforms like Retell AI, wiring Twilio SIP trunks, and automating post-call workflows in n8n — rather than *receiving* voice input during a live coding session. No independently verified sources confirm Anthropic's "channels" feature or the specific mid-session voice injection architecture Uttero describes. This gap does not necessarily invalidate the project, since pre-release and research-preview features routinely precede public documentation, but it does mean the underlying infrastructure should be treated as unverified until Anthropic publishes official details about the channels API.

The project sits at the intersection of two accelerating trends in AI development: the proliferation of agentic coding tools and the normalization of multimodal human-AI interaction. Claude Code itself represents Anthropic's push toward agents that can operate over long, multi-step software tasks, and Uttero extends that paradigm into ambient, voice-mediated collaboration — a mode of interaction that reduces the physical overhead of developer communication. The broader ecosystem of voice-enabled Claude Code integrations, including tutorials building customer-facing call agents with Twilio and Airtable, suggests developer appetite for voice as a primary interface rather than a supplementary one. Uttero's specific innovation is directing that voice channel inward, toward the developer's own session, rather than outward toward end users.

Whether Uttero represents a durable pattern or a brittle pre-release experiment will depend heavily on how Anthropic productizes the channels feature. The requirement for a development flag and the author's explicit request for breakage reports position this squarely as early infrastructure. Nevertheless, the core workflow it demonstrates — dictating intent over voice while Claude autonomously handles execution, then returning to find completed work — is precisely the interaction model that agentic AI proponents have described as the near-term trajectory of software development. Uttero offers one of the more concrete, publicly documented attempts to operationalize that vision, and the self-referential nature of its development process makes it a useful, if anecdotal, proof of concept.

Read original article →

Detailed Analysis

Don't Miss a Deploy