Observability with OpenTelemetry - Claude Code Docs

When you run agents in production, you need visibility into what they did: which tools they called how long each model request took how many tokens were spent where failures occurred The Agent SDK can export this data as OpenTelemetry traces, metrics, and log

Detailed Analysis

Anthropic's Claude Code Agent SDK incorporates native OpenTelemetry (OTel) instrumentation, giving engineering teams production-grade visibility into the behavior of AI agents built on the platform. The SDK architecture routes telemetry through the Claude Code CLI, which runs as a child process and handles the actual emission of three independent OTel signals: metrics (token counts, costs, session lengths, tool decisions), structured log events (prompts, API requests, errors, tool results), and traces (hierarchical spans covering each interaction, model request, tool call, and hook execution). This design means the SDK itself does not generate telemetry directly — configuration is passed as environment variables to the CLI, which exports via the OpenTelemetry Protocol (OTLP) to any compliant backend, including commercial offerings such as Honeycomb, Datadog, Grafana, and Langfuse, as well as self-hosted collectors. Telemetry is inactive by default and requires the explicit flag `CLAUDE_CODE_ENABLE_TELEMETRY=1` alongside at least one exporter declaration, a deliberate opt-in posture that reflects both performance and privacy considerations.

The telemetry architecture is noteworthy for its configurability at the per-signal level. Each of the three signals — metrics, logs, and traces — can be enabled independently, allowing operators to instrument only what they need. Traces, which provide the most granular view through named spans such as `claude_code.interaction`, `claude_code.llm_request`, `claude_code.tool`, and `claude_code.hook`, sit behind an additional beta flag (`CLAUDE_CODE_ENHANCED_TELEMETRY_BETA=1`), signaling that span-level introspection is still being stabilized. The `session.id` attribute carried on every span enables correlation across multiple `query()` calls within a single agent session, a critical capability for reconstructing multi-turn agent workflows in a tracing backend. Configuration can be injected either at the process environment level — suitable for containers and orchestrators — or per-call via `options.env`, with an important distinction: in TypeScript, `options.env` replaces the inherited environment entirely, requiring developers to explicitly spread `process.env` to avoid losing system-level variables.

A key operational consideration documented in the SDK is telemetry flushing behavior in short-lived processes. The CLI batches telemetry on intervals — 60 seconds for metrics, 5 seconds for traces and logs by default — and flushes on clean exit. This means agents that complete normally retain their telemetry, but processes killed abruptly risk losing buffered data. The guidance to lower export intervals for short-running tasks represents a practical tradeoff: reduced latency to the collector at the cost of slightly higher network overhead. This is a well-understood challenge in distributed systems observability, and the fact that it surfaces explicitly in Claude Code's documentation suggests Anthropic is designing for real production deployment patterns rather than toy demonstrations.

The privacy architecture of the telemetry system is structurally conservative. By default, only structural metadata is exported — token counts, durations, model names, tool names — while the actual content of prompts, tool inputs, and tool outputs is suppressed. Three separate opt-in variables (`OTEL_LOG_USER_PROMPTS=1`, `OTEL_LOG_TOOL_DETAILS=1`, `OTEL_LOG_TOOL_CONTENT=1`) progressively add content to exports, with the most verbose mode truncating tool payloads at 60 KB. This layered approach allows organizations to calibrate their observability posture against their data governance requirements, a meaningful design choice given that AI agents routinely operate on sensitive codebases, file systems, and API credentials.

The broader significance of this capability lies in what it signals about the maturation of AI agent infrastructure. OpenTelemetry has become the de facto standard for observability in cloud-native systems, and its adoption by Anthropic's Claude Code SDK represents the convergence of AI agent tooling with established SRE and platform engineering practices. Third-party integrations already documented by vendors including Oodle, SigNoz, Honeycomb, and Axiom confirm that the ecosystem is actively building around this interface. As AI agents take on more complex, long-running, and production-critical tasks, the ability to audit token spend, trace failure paths, and attribute costs to specific teams or workloads becomes operationally essential — not merely useful. Anthropic's embedding of OpenTelemetry natively into the CLI, rather than leaving it to application developers to instrument, lowers the barrier to production-ready agent deployment considerably.

Read original article →

Detailed Analysis

Don't Miss a Deploy