6 months of .md memory, conflicting facts are the hard part

An agent memory system using markdown files experienced hallucinations caused by contradicting facts in its knowledge base. The developer implemented a human-in-the-loop escalation mechanism via Telegram bot to resolve conflicts, embedding these resolutions as authoritative truth for future reference. Over three weeks of testing, this approach showed improvements, though optimal thresholds for self-resolution versus human escalation remain undetermined.

Detailed Analysis

A developer with six months of hands-on experience using a markdown-based filesystem for AI coding agents has documented a sophisticated, self-hosted memory architecture and the practical challenges that emerge when knowledge bases accumulate contradictory information over time. The system uses a two-tier structure: a "warm" layer updated multiple times daily and at ingestion, and a heavily cross-linked archive layer. Over time, the developer has added cross-linking, truncation, and knowledge extraction capabilities, and is now migrating the local filesystem to cloud infrastructure — suggesting the approach has proven durable enough to warrant increased investment.

The core technical challenge the developer identifies is hallucinations caused by conflicting facts accumulating within the knowledge base as agents make decisions and extract learnings. Rather than accepting the recency-based resolution that most third-party memory tools employ by default, the developer implemented a human-in-the-loop escalation mechanism via a Telegram bot. When conflicts are detected, unresolvable contradictions are surfaced to the human operator, whose resolution is then embedded and used as a reference point — effectively a "ground truth" anchor — for handling future conflicts of a similar nature. Three weeks into this approach, the developer reports measurable improvement in hallucination rates.

The two open questions the developer raises are among the most substantive unsolved problems in agentic AI memory design. The threshold question — when should a system self-resolve versus escalate — is fundamentally a question about confidence calibration and risk tolerance. Low-stakes factual conflicts may be safely resolved by the agent using heuristics like recency, source authority, or semantic similarity, while high-stakes or domain-critical contradictions may warrant human adjudication. The absence of a principled framework for making this determination automatically is a genuine gap in the current state of agentic systems design.

The second question — whether human input constitutes valid "truth" — reflects a deeper epistemological tension in AI memory systems. Human-provided resolutions are authoritative in the sense that they represent the operator's intent and domain knowledge, but they can also encode bias, error, or context-specific judgments that may not generalize. Using those resolutions as embedded ground truth for future conflicts compounds any errors introduced at resolution time. This is structurally analogous to the challenges in RLHF (Reinforcement Learning from Human Feedback), where human preference data quality directly determines downstream model behavior quality.

This developer's experience connects to a broader trend in the AI ecosystem toward persistent, structured agent memory as a first-class engineering concern. As large language models like Claude are increasingly deployed in agentic contexts with long-running tasks and stateful workflows, the question of how agents maintain coherent, non-contradictory world models across sessions has become critical. Most current solutions — vector databases, summarization pipelines, RAG architectures — treat memory retrieval as a search problem rather than a knowledge integrity problem. This developer's approach, by contrast, treats contradiction resolution as a distinct layer of the memory system requiring its own logic and governance, which represents a meaningfully more mature framing of the problem.

Read original article →

Detailed Analysis

Don't Miss a Deploy