I built a self-hosted memory layer for Claude that runs free on Cloudflare — open source

A developer created a Cloudflare Worker-based MCP server that acts as a persistent memory system for Claude, featuring tools to remember, recall, list recent, and forget information across sessions. The system uses vector embeddings through Cloudflare's Workers AI to enable semantic search capabilities, allowing Claude to find memories by meaning rather than exact keyword matching. The entire solution runs free on Cloudflare's free tier and includes a one-click deployment option.

Detailed Analysis

A developer identified by the GitHub handle rahilp has released an open-source, self-hosted memory layer for Anthropic's Claude AI assistant, designed to solve one of the most commonly cited limitations of large language model assistants: the inability to retain information across separate conversation sessions. The project, dubbed "second-brain-cloudflare," is implemented as a Cloudflare Worker functioning as a Model Context Protocol (MCP) server, exposing four distinct tools — remember, recall, list_recent, and forget — that Claude can invoke automatically when given appropriate system prompt instructions. The entire stack runs on Cloudflare's free tier, comprising Cloudflare Workers for serverless compute, D1 (Cloudflare's SQLite-based edge database) for structured storage, Vectorize for vector database functionality, and Workers AI for embedding generation. The project supports Claude Desktop, Claude Code, and claude.ai via custom connectors, and ships with a one-click deploy button for ease of setup.

The technical centerpiece of the implementation is its semantic recall mechanism. Rather than relying on keyword matching, the system generates vector embeddings for each stored memory using the bge-small-en-v1.5 model through Workers AI, then indexes those embeddings in Cloudflare Vectorize. This allows recall queries to match by conceptual meaning rather than exact phrasing — a user who stores a note about "users drop off at checkout" can retrieve it later by querying for "onboarding problems," because the embedding space captures the semantic relationship between the concepts. This design choice meaningfully differentiates the project from simpler text-search approaches and reflects a broader maturation in how developers are approaching persistent AI memory: not as a simple log retrieval problem, but as a semantic understanding problem.

The project is also notable for its use of Claude itself as a primary development tool throughout the build process. The developer credits Claude with writing most of the MCP server implementation in TypeScript, helping architect the Vectorize and D1 data layer, generating iOS Shortcuts templates and a bookmarklet for cross-platform access, and authoring the project's README documentation. This recursive dynamic — where Claude is used to build tooling that enhances Claude's own capabilities — illustrates a pattern becoming increasingly common in the developer community, where AI assistants are treated as active collaborators in extending AI infrastructure rather than passive end products.

The release lands within a broader ecosystem moment defined by the emergence of the Model Context Protocol, Anthropic's open standard for connecting AI models to external tools, data sources, and services. MCP has rapidly become a focal point for third-party extensibility, enabling developers to build exactly this kind of persistent-state infrastructure without modifying Claude itself. The community around MCP-based tooling is growing quickly, and projects like this one demonstrate how far individual developers can extend Claude's capabilities using freely available cloud infrastructure. The fact that a production-quality semantic memory layer can be self-hosted at zero marginal cost on Cloudflare's free tier lowers the barrier substantially for developers and power users who want persistent, privacy-preserving AI memory without relying on proprietary or subscription-gated solutions.

The broader significance of this project extends beyond its technical implementation. Claude's statelessness between sessions has been a persistent friction point for users who want to build ongoing working relationships with the model — tracking project context, personal preferences, accumulated research, or organizational knowledge over time. Commercial solutions to this problem typically involve vendor-managed memory stored on third-party infrastructure, which raises data sovereignty concerns for privacy-conscious users. A self-hosted, open-source alternative running on widely trusted cloud infrastructure represents a meaningful option for this segment of users. As AI assistants increasingly move from one-off query tools toward persistent collaborative agents, the infrastructure patterns pioneered by projects like this one are likely to inform how both the open-source community and commercial AI providers approach long-term memory architecture.

Read original article →

Detailed Analysis

Don't Miss a Deploy