claudely: launch Claude Code against Local LLM provider like LM Studio / Ollama / llama.cpp without trashing your real claude config

Claudely is a CLI tool that enables Claude Code to use locally-hosted language models instead of Anthropic's API while preserving all of Claude Code's plugins, skills, and MCP ecosystem. The tool supports multiple local LLM providers including LM Studio, Ollama, llama.cpp, and Anthropic-compatible endpoints, automatically handling provider-specific configuration and fixing a prompt-cache performance bug that degrades local model speed. Available as an open-source Node.js package, claudely allows developers to seamlessly switch between local and cloud-based Claude without modifying their shell configuration.

Detailed Analysis

Claudely represents a pragmatic community solution to a growing tension in AI developer tooling: the rich ecosystem built around Anthropic's Claude Code has no easy path to local LLM substitution. Claude Code, Anthropic's terminal-based coding agent, has accumulated a substantial body of supporting infrastructure — MCP servers, slash commands, skills, plugins, and hooks — that developers have invested significant time in configuring and extending. When those developers seek to route their workloads through local models running on hardware like LM Studio, Ollama, or llama.cpp, they face a binary choice: abandon their tooling or abandon local inference. Claudely, an MIT-licensed community utility published on npm, eliminates that binary by acting as a thin session-scoping layer that launches Claude Code with the correct base URL and authentication configuration for a locally running model, leaving the user's actual Claude configuration and shell environment entirely untouched.

The technical implementation is deliberately minimal in scope. Claudely does not replace or fork Claude Code; it wraps it. By spawning `claude` with provider-specific environment variables pre-configured, it allows the underlying client to treat a local inference server — whatever is serving at the relevant endpoint — as if it were the Anthropic API. This architectural choice means that every capability Claude Code supports, from MCP integrations to custom slash commands, continues to function without modification. The tool also addresses a concrete performance problem: a prompt-cache handling bug that, left uncorrected, reportedly degrades local-model session speed by approximately 90%. That the project was itself built with Claude Code's assistance is a small but pointed illustration of the recursive utility it enables.

The broader significance of claudely lies in what it reveals about community adaptation around commercial AI tooling. Anthropic has invested heavily in making Claude Code a compelling developer environment, and that investment has created genuine ecosystem lock-in — not through restrictive licensing, but through accumulated workflow value. Developers do not want to leave Claude Code's orbit; they want to extend its reach. Claudely's support for Anthropic-compatible endpoints, including proxies like LiteLLM and claude-code-router that themselves bridge to OpenAI-protocol backends such as vLLM, suggests the tool is also positioning itself as connective tissue for a heterogeneous local inference landscape rather than a solution narrowly scoped to a single runtime.

This pattern — community tooling that treats a commercial AI client as a stable, high-value interface layer while abstracting over the underlying model provider — is likely to intensify as local model capabilities continue to close the gap with frontier APIs. The emerging dynamic is one in which the client ecosystem, not the model weights themselves, becomes the primary source of developer stickiness. Projects like claudely implicitly validate Anthropic's strategy of building a rich, extensible coding agent, even as they enable users to route around Anthropic's own inference infrastructure. Whether Anthropic views such workarounds as a threat to API revenue or as evidence of Claude Code's platform-level influence is an open question, but the existence and apparent appeal of claudely suggests the latter framing may be the more strategically accurate one.

Read original article →

Detailed Analysis

Don't Miss a Deploy