I built an open source remote AI agent that controls your desktop from your phone

A developer created an open-source remote AI agent that enables desktop control from a phone, allowing the agent to observe screens, execute CLI commands, and click buttons while streaming progress in real time. The tool offers privacy-focused options including a local-only mode that avoids sending screenshots to cloud providers and support for locally-running vision models like OmniParser. The project, built using Claude Code and the BMAD method, is available for Android and desktop with an iOS version still in development.

Detailed Analysis

A developer has released an open-source remote compute agent — built primarily using Anthropic's Claude Code — that enables users to control a full desktop environment from a mobile phone, including both CLI and GUI interactions. The tool streams live screen output to the phone, allows the user to either issue natural language commands to a built-in AI agent or take manual control of the desktop directly, and supports multiple AI backends including Claude Code, Codex, Gemini CLI, and Open Claw through a configurable "SKILL" system. The project was developed over approximately one month using the BMAD methodology, a structured agentic workflow that sequences separate AI agent personas through phases such as PRD creation, architecture design, epic and story definition, and implementation. Android APK and desktop client builds are available via GitHub releases, with an iOS version still in development pending App Store distribution logistics.

The project addresses a meaningful gap in the existing remote desktop tooling landscape. While solutions for remote access to machines already exist — such as traditional VNC, RDP, or newer agent-oriented tools like OpenClaw — most do not provide a unified interface that combines live visual output, agentic task execution, and interactive CLI control from a mobile device. The developer's stated motivation is practical: monitoring and steering long-running AI coding sessions without being physically tethered to a workstation. This reflects a broader pattern in developer tooling where AI coding assistants like Claude Code are becoming long-duration autonomous workers requiring supervision, rather than short-burst interactive tools, creating demand for asynchronous oversight mechanisms.

Privacy architecture receives notable attention in the project's design. The developer offers three tiers of data exposure: a fully local mode using accessibility trees and headless browser parsing that never captures screenshots, a locally-run vision pipeline using OmniParser that processes screen images on-device without sending data to external APIs, and a standard cloud vision mode for users comfortable with screenshot transmission to model providers. This layered approach directly acknowledges the tension that emerges when agentic systems require visual context — a concern that becomes more acute as tools like Claude's computer use capability and similar desktop agents become more prevalent. The roadmap includes full self-hosted model support for both vision and text inference, which would close the remaining data-sovereignty gap.

The project sits at an intersection of several accelerating trends in AI development: the rise of long-horizon agentic workflows, the demand for mobile-first interfaces to AI infrastructure, and the proliferation of open-source alternatives to proprietary agent platforms. The developer explicitly draws a comparison to Anthropic's own Dispatch product, positioning this tool as broader in scope since it operates across any application or interface rather than being scoped to a specific coding assistant. The use of Claude Code to build an application that itself orchestrates Claude Code — a recursive deployment pattern — illustrates how frontier coding assistants are increasingly being used not just to accelerate individual programming tasks but to construct entire agentic systems and developer toolchains. As compute-intensive AI sessions migrate to always-on desktop environments, the need for lightweight mobile supervision layers like this one is likely to grow substantially.

Read original article →

Detailed Analysis

Don't Miss a Deploy