Show HN: I built a full LLM chat client as a Neovim filetype

Flemma is a Neovim filetype that functions as a complete LLM chat client, supporting multiple AI models including Anthropic, OpenAI, Vertex, and Moonshot. Conversations are stored as Markdown files directly in the editor buffer, making them portable, version-controllable, and greppable, while the tool supports features like tool calling, autonomous agent loops, prompt caching, and extended thinking. The project emerged from the creator's desire to move AI workload from web-based developer portals into their editor, initially as a simple concept that evolved into a full AI workspace.

Detailed Analysis

Flemma.nvim represents a notable entry in the growing ecosystem of developer-native AI tooling, built by an independent developer who sought to escape the friction of browser-based LLM interfaces like Anthropic's Claude Workbench, OpenAI Platform, and Google Cloud's Vertex console. The project implements a full LLM chat client as a custom Neovim filetype, treating `.chat` files — plain Markdown documents annotated with role markers such as `@You:` and `@Assistant:` — as first-class conversation objects with their own parser, abstract syntax tree, LSP server, and template engine. The result is a self-contained AI workspace that integrates natively with Anthropic Claude, OpenAI, Vertex, and Moonshot, and allows users to switch between providers mid-conversation — a capability the developer uses to solicit perspectives from multiple models during research before synthesizing findings into a unified document.

The architectural decision to make the buffer itself the complete application state is one of Flemma's most technically distinctive choices. By eschewing hidden databases, JSON history files, and persistent server processes, the tool ensures that conversations are portable, greppable, and version-controllable via Git. This design philosophy aligns closely with the Unix tradition of plain-text persistence and makes Flemma's chat history a first-class artifact of a developer's working environment rather than data siloed inside a proprietary tool. The project supports prompt caching, extended thinking — a capability specific to Anthropic's Claude models — sandboxed tool execution, autonomous agent loops, and seven built-in tools, placing it functionally in the same category as more established agentic coding environments like Aider and Claude Code, which the developer credits as direct influences on Flemma's evolution.

The emergence of Flemma reflects a broader and accelerating trend: the disaggregation of AI interfaces away from browser-based portals and toward developer-controlled, editor-native environments. Tools like Aider, Claude Code, Cursor, and a growing catalog of Neovim and VS Code plugins signal that a meaningful segment of the developer community prefers to keep AI interactions within the context of their existing text-editing workflows rather than context-switching to dedicated applications. For Anthropic specifically, this trend is significant — Claude's API is being consumed not just through its own products but through an expanding layer of third-party clients that mediate the user experience entirely, shifting the competitive surface from interface quality to model capability, API reliability, and pricing.

Flemma also illustrates the practical demand for multi-model workflows at the individual developer level. The ability to query Claude, GPT-4, and Vertex models in sequence within a single conversation thread, then consolidate their outputs, reflects an emerging pattern in professional AI use where no single model is presumed to be authoritative across all tasks. This "model-agnostic" posture, once primarily a concern of enterprise AI infrastructure teams, is now being built into personal developer tooling. As the gap between frontier models narrows on many benchmarks, tools that enable fluid switching between providers may become as important to productivity as the models themselves — and projects like Flemma suggest that the Neovim community is building that infrastructure for itself, without waiting for any single AI lab to deliver it.

Read original article →

Detailed Analysis

Don't Miss a Deploy