I built an MCP server that turns Claude into an emergency medicine assistant — what I learned building AI for high-stakes domains

EMSy is an MCP server that connects Claude to a RAG pipeline of ERC/AHA guidelines and PubMed to provide emergency medicine assistance through three tools: fast clinical Q&A, deep case reasoning, and evidence appraisal. Building AI for high-stakes clinical domains requires careful scope-of-practice distinctions between professions, automatic archival of outdated protocols, risk-based query routing to optimize costs, and mandatory source citations for every answer. These design considerations reflect how errors in healthcare carry far greater consequences than in typical AI applications.

Detailed Analysis

A developer working under the handle associated with the "EMSy" project has built a Model Context Protocol (MCP) server that integrates Claude with a retrieval-augmented generation (RAG) pipeline drawing from Emergency Resuscitation Council/American Heart Association guidelines and PubMed literature, effectively positioning Claude as a clinical decision-support tool for emergency medicine professionals. The system exposes three distinct tool modes: fast clinical Q&A for pharmacology, protocols, and procedures; deep reasoning for complex case analysis and differential diagnosis; and evidence appraisal for evaluating studies and protocols. Rather than a monolithic AI assistant, EMSy operates as a layered architecture that routes queries by risk tier — classifying them as low, medium, high, or critical — directing each to an appropriately capable model, thereby balancing cost efficiency against the precision demands of life-critical queries. Every response is mandated to include exact source citations, a design choice that reflects the fundamental epistemological standard of clinical medicine: assertions without traceable provenance are inadmissible in high-stakes decision environments.

The engineering challenges the developer documents reveal a set of domain-specific constraints that generic AI deployments rarely confront with the same severity. Chief among these is the scope-of-practice problem: a paramedic and a physician operate under legally and clinically distinct authorities, meaning that a pharmacological or procedural answer that is technically correct for one role may be actively dangerous when applied by another. This necessitates profession-aware prompt construction — a layer of role-specificity that general-purpose assistants typically abstract away but that emergency medicine cannot afford to ignore. Equally critical is the guideline expiry problem. Clinical protocols — particularly in emergency and resuscitation medicine — are revised as new evidence emerges, and a RAG system that surfaces a superseded AHA guideline could contribute to patient harm. The developer's solution of auto-archiving stale protocols addresses what is fundamentally a data lifecycle governance challenge, one that is far more consequential in this domain than in most commercial applications.

These lessons align closely with Anthropic's own internal research on agentic AI deployment. An Anthropic analysis of over 998,000 tool calls on its public API identified healthcare as one of the emerging sectors for Claude-based agents, alongside finance and cybersecurity, and flagged that high-risk action clusters — even when rare — carry disproportionate harm potential in clinical contexts. Anthropic's research has consistently emphasized that agents operating on longer action horizons require robust human-AI oversight loops, and that even experienced users must maintain the capacity to identify what the company describes as "smart but dangerous" outputs — fluent, confident-sounding responses that nonetheless contain clinically consequential errors. The EMSy project's mandatory citation requirement is a direct architectural response to this risk: by forcing the model to surface its evidentiary basis, the system creates a mechanism for clinician verification that partially compensates for the absence of a real-time human-in-the-loop.

The project is also a concrete illustration of the broader trend toward domain-specialized AI deployments built atop foundation models rather than trained from scratch. Rather than developing a bespoke clinical language model — an undertaking requiring vast proprietary datasets and regulatory navigation — the developer leverages Claude's general reasoning capabilities and constrains them through retrieval grounding, role-aware prompting, and tiered risk routing. This architectural pattern reflects a maturing understanding in the AI development community that foundation model capability, when properly scaffolded, can be directed toward highly specialized tasks with considerably less infrastructure than traditional ML pipelines. The MCP server paradigm specifically enables this modularity, allowing tools to be composed, swapped, or updated independently as clinical guidelines evolve or new evidence emerges.

What EMSy ultimately demonstrates is that deploying AI in high-stakes domains is less a problem of raw model capability and more a problem of institutional and epistemic alignment — ensuring the system knows not just what is true, but what is true for whom, under what authority, and as of when. The developer's four documented lessons (scope-of-practice differentiation, guideline recency enforcement, risk-tiered routing, and mandatory citation) constitute something close to a minimum viable safety specification for clinical AI deployment. As Claude and similar models increasingly appear in healthcare-adjacent applications — whether through formal medical device pathways or informal clinical decision support tools — the engineering patterns documented in projects like EMSy will likely inform both practitioner-built systems and the more rigorous regulatory frameworks that such deployments will eventually require.

Read original article →

Detailed Analysis

Don't Miss a Deploy