@yoavgo 👋 yep we do this for Bedrock, Vertex, and Azure, since they don’t have

Bedrock, Vertex, and Azure lack native token-counting APIs, so developers call the Haiku model and extract the token count from the response. Anthropic API provides a dedicated token-counting endpoint directly. This approach serves as a fallback when primary APIs are unavailable and eliminates the need to maintain separate tokenizers that could diverge from actual tokenization.

Detailed Analysis

Claude Code, Anthropic's AI-powered coding assistant, employs an unconventional fallback mechanism for token counting when operating through third-party cloud platforms: rather than using a dedicated local tokenizer or a purpose-built counting endpoint, it sends requests to the lightweight Claude Haiku model and reads the token count from the resulting API response metadata. This behavior was surfaced publicly by developer Yoav Goldberg and confirmed by Boris Cherny, a member of the Claude Code team, who clarified that the technique is specifically used when Claude Code runs via Amazon Bedrock, Google Vertex AI, and Microsoft Azure — platforms that, at the time of the thread, lacked access to Anthropic's dedicated token-counting API endpoint. When Claude Code operates directly against the Anthropic API, it uses the token-counting endpoint as intended.

The technical rationale behind this design choice illuminates the constraints Anthropic faces when distributing Claude through third-party cloud infrastructure. Anthropic's tokenizer is proprietary, and as multiple participants in the thread noted, the company has not released a standalone tokenizer library for local use, nor does it publicly document certain implementation details — such as how whitespace or punctuation interacts with token boundaries. Rather than shipping a tokenizer that could drift from the production model over time, or risk returning inaccurate counts, the engineering team opted to exploit the fact that token counts are model-agnostic across the Claude family: since Haiku, Sonnet, and Opus all share the same tokenization scheme, a cheap Haiku inference call returns an accurate count for any context, regardless of which model will ultimately process it. The approach trades minor additional API cost for accuracy and maintainability.

Community reaction to the revelation was largely amused, with some developers pointing out the philosophical irony of using a large language model to perform a task — counting tokens — that is essentially deterministic and computationally trivial. Critics noted that enterprise-scale deployments routing through trillion-dollar cloud infrastructure should not require such workarounds, and that the underlying issue is the absence of a standardized token-counting API across Bedrock, Vertex, and Azure. One commenter pointed out that Amazon Bedrock does maintain a CountTokens API, though it has not yet been extended to support newer Claude models, suggesting the gap is partially a matter of platform versioning lag rather than fundamental absence of infrastructure. The thread also surfaced a user request to decouple the token-counting model from other model-selection configurations, particularly for users who have set up Opus-only environments for specific repositories and do not want a Haiku call silently occurring in the background.

The broader context here is the increasing complexity of deploying frontier AI models through multi-cloud distribution channels. Anthropic has pursued aggressive integration with Amazon Web Services, Google Cloud, and Microsoft Azure — the latter formalized through a strategic partnership announced in November 2025 involving NVIDIA-powered Azure infrastructure. While these integrations expand enterprise access and enable features like IAM-based authentication, workload identity federation, and native observability tooling, they also introduce capability asymmetries: features available natively on the Anthropic API may not be immediately mirrored on partner platforms. Token counting is a relatively mundane example, but it illustrates a structural challenge for developer tooling built atop distributed AI infrastructure — namely, that the abstraction layers introduced by cloud intermediaries can force unexpected engineering compromises at the application layer. The Claude Code team's Haiku-based workaround is functional and defensible, but it underscores the degree to which tooling reliability currently depends on creative problem-solving rather than standardized cross-platform APIs.

Read original article →

Detailed Analysis

Don't Miss a Deploy