Making Claude Code more secure and autonomous with sandboxing

Anthropic introduced two major sandboxing features for Claude Code that isolate code execution with filesystem and network controls, reducing permission prompts by 84% while preventing prompt injection attacks from accessing sensitive files or exfiltrating data. The sandboxed bash tool allows Claude to run commands freely within defined boundaries using OS-level primitives (Linux bubblewrap, macOS seatbelt), while Claude Code on the web securely executes in isolated cloud environments with credentials managed through a custom proxy service. The sandboxing technology is now open-sourced for developers building their own agents to adopt safer practices.

Detailed Analysis

Anthropic has introduced two significant sandboxing features for Claude Code, its agentic coding assistant, designed to simultaneously improve security and reduce the friction of constant permission prompts. The system operates on a dual-isolation model: filesystem isolation, which restricts Claude's read and write access to specific directories, and network isolation, which limits outbound connections to an approved set of servers routed through a controlled proxy. Together, these two mechanisms address the primary threat vector in agentic coding environments — prompt injection — by ensuring that even a successfully compromised Claude Code session cannot exfiltrate sensitive files such as SSH keys or communicate with attacker-controlled infrastructure. Anthropic reports that in internal usage, sandboxing reduces permission prompts by 84%, directly countering the "approval fatigue" dynamic in which users, overwhelmed by repeated confirmation dialogs, may inadvertently authorize harmful operations.

The technical implementation is grounded in operating system-level primitives rather than heavier containerization approaches. On Linux, Anthropic uses bubblewrap, and on macOS, the seatbelt framework, to enforce restrictions that apply not only to Claude Code's direct actions but also to any scripts, subprocesses, or MCP servers spawned during a session. Network isolation is enforced via a Unix domain socket connected to an external proxy server, which handles domain allow-listing and user confirmation for newly requested destinations. The design choice to build on OS-level primitives — rather than spinning up full containers — reflects a deliberate tradeoff favoring lower overhead and faster developer iteration while still achieving meaningful isolation boundaries. Both components are configurable, allowing developers to fine-tune which file paths and network destinations fall inside or outside the sandbox.

The second feature, Claude Code on the web, extends these principles to a cloud-hosted execution environment. Each session runs in an isolated cloud sandbox where sensitive credentials, such as git authentication tokens or code-signing keys, are architecturally excluded from the sandbox environment itself. Git interactions are instead mediated through a custom proxy service using scoped, session-specific credentials, meaning that even if code executing inside the sandbox were fully compromised, the attacker would be unable to escalate to the user's broader credential store or repository access. This architecture treats the sandbox as inherently untrusted by design, a "zero trust" posture applied to the agent's own execution environment — a meaningful departure from earlier agentic systems that relied primarily on behavioral guardrails rather than hard isolation boundaries.

The release carries significance beyond Claude Code itself. Anthropic has open-sourced the sandbox runtime as a research preview, explicitly encouraging other AI development teams to adopt similar isolation architectures for their own agents. This move positions sandboxing not merely as a Claude-specific feature but as a proposed industry standard for safe agentic deployment. The decision to open-source reflects a recognition that the security challenges facing Claude Code — prompt injection, credential exfiltration, unintended filesystem modification — are universal to any sufficiently capable coding agent, regardless of the underlying model. By releasing the tooling openly, Anthropic is attempting to establish architectural norms for the broader ecosystem at a moment when agentic AI systems are proliferating rapidly and security standards remain nascent.

This development fits into a broader pattern of AI labs grappling with the tension between agent autonomy and safety. As coding assistants evolve from single-turn suggestion tools into persistent agents capable of navigating entire codebases, running arbitrary commands, and interacting with external services, the attack surface expands dramatically. Anthropic's sandboxing approach represents a structural answer to this challenge — one that accepts the premise that agents will sometimes be manipulated or make errors, and designs the execution environment to contain the blast radius of such failures. The dual-isolation architecture, the open-source release, and the cloud-hosted offering together suggest a maturing understanding within the field that security for agentic AI cannot be solved at the model level alone, but requires layered, system-level defenses built into the infrastructure surrounding the model.

Read original article →

Detailed Analysis

Don't Miss a Deploy