LLM gateway configuration - Claude Code Docs

LLM gateways provide a centralized proxy layer between Claude Code and model providers, offering centralized authentication, usage tracking, cost controls, audit logging, and model routing capabilities. To work with Claude Code, gateways must support at least one API format (Anthropic Messages, Bedrock InvokeModel, or Vertex rawPredict) and must properly forward or preserve specified request headers and body fields. Claude Code supports multiple gateway configurations, including LiteLLM integration with options for static API keys, dynamic key rotation, and various endpoint configurations across different cloud providers.

Detailed Analysis

Claude Code's LLM gateway configuration framework establishes a standardized proxy architecture that sits between the development tool and underlying model providers, addressing a growing set of enterprise needs around authentication, cost management, and compliance. The documentation outlines three supported API formats — Anthropic Messages, Bedrock InvokeModel, and Vertex rawPredict — each with specific header-forwarding or body-preservation requirements that gateways must honor to maintain full Claude Code functionality. A critical technical detail is the inclusion of the `X-Claude-Code-Session-Id` header on every API request, which allows proxy operators to aggregate session-level traffic without parsing request bodies. Additionally, Claude Code automatically queries a gateway's `/v1/models` endpoint at startup (as of v2.1.126) when pointed at an Anthropic Messages-format gateway, dynamically populating the model picker with discovered options labeled "From gateway" — a mechanism that extends flexibility while caching results locally to ensure resilience against endpoint failures.

The documentation's treatment of LiteLLM as the primary illustrative gateway reflects its dominant position in the open-source proxy ecosystem. The configuration options span a spectrum from simple static API key injection via `ANTHROPIC_AUTH_TOKEN` to sophisticated dynamic key retrieval through shell-script helpers capable of integrating with secrets vaults or JWT issuers on a configurable TTL. Critically, the documentation recommends the unified LiteLLM endpoint over provider-specific pass-through routes, citing advantages in load balancing, automatic failback, and consistent cost and user tracking. For organizations using AWS Bedrock or Google Vertex AI, dedicated pass-through endpoint variables (`ANTHROPIC_BEDROCK_BASE_URL`, `ANTHROPIC_VERTEX_BASE_URL`) allow Claude Code to route through LiteLLM while bypassing its default cloud-provider authentication flows via skip-auth flags, decoupling identity management from the underlying infrastructure.

The significance of this architecture extends well beyond technical convenience. Enterprises operating in regulated industries face stringent requirements around audit logging, budget enforcement, and data residency — concerns that a direct developer-to-API connection cannot easily satisfy at scale. By centralizing all model interactions through a gateway, organizations gain a single observability plane across teams and projects, enabling fine-grained cost attribution and rate limiting without modifying individual developer workflows. The attribution block that Claude Code prepends to system prompts — which the Anthropic API strips before processing — illustrates a deliberate design choice to preserve first-party prompt caching efficiency even as gateways introduce additional routing layers. The `CLAUDE_CODE_ATTRIBUTION_HEADER=0` escape valve for gateway-side caches further signals that Anthropic has architected Claude Code with enterprise middleware explicitly in mind.

This gateway-first approach connects to a broader industry trend in which AI development tooling is increasingly being designed for organizational deployment rather than individual use. The emergence of multiple competing gateway products — LiteLLM, Kong AI Gateway, Zuplo, and others — mirrors the pattern seen in the earlier API management market, where tools like Kong and Apigee became essential infrastructure for REST API governance. For AI models specifically, gateways address an additional layer of complexity: the need to abstract over rapidly evolving and proliferating model providers, enabling teams to switch underlying models (including non-Anthropic ones) without touching application code. Claude Code's support for 180+ providers through LiteLLM's unified interface exemplifies how AI-native development tools are being built to be provider-agnostic at the infrastructure level even when they are provider-specific at the UX level.

The documentation's architectural choices also reflect Anthropic's recognition that Claude Code must function credibly within existing enterprise security postures. Features like dynamic API key rotation, vault integration, and per-session request identifiers are not developer conveniences — they are prerequisites for adoption in organizations with mature security operations. The fallback behavior in model discovery (gracefully degrading to cached or built-in model lists when a gateway's `/v1/models` endpoint is unavailable) similarly reflects a reliability-first design philosophy suited to production environments where network partitions or gateway downtime cannot be allowed to fully block developer workflows. Together, these design decisions position Claude Code not merely as a coding assistant but as an enterprise-grade AI development platform capable of integrating into complex, multi-layered infrastructure stacks.

Read original article →

Detailed Analysis

Don't Miss a Deploy