An AI Equity Research Framework: Design Principles and Setup

An AI equity research framework for stock analysis was built using Claude and two MCP tools, structured around four core principles: defining measurable research questions before retrieval, anchoring hypotheses with prior probabilities, grading claim reliability across three tiers, and using isolated adversarial review to counter confirmation bias. The framework employs ashare-mcp for bulk historical financial data and NotebookLM MCP for semantic search across long documents, executed in parallel with web search to keep data retrieval separate from Claude's reasoning context. The approach was demonstrated analyzing Apple's sustainability of Services revenue growth and premium hardware margins against AI competition and regulatory pressures.

Detailed Analysis

A practitioner-built equity research framework leveraging Claude Code demonstrates how large language model agents can be structured to overcome inherent reasoning weaknesses when applied to complex financial analysis. The system combines Claude Code with two Model Context Protocol (MCP) tools and a roughly 400-line protocol document called Deep Research.md, which governs the agent's behavior from start to finish. The workflow is initiated in Plan Mode with a single instruction directing the agent to follow the protocol strictly before conducting research on a specified ticker. While the current implementation is optimized for Chinese A-share stocks, the protocol architecture itself is designed to be market-agnostic, with the MCP data tools serving as the market-specific layer that can theoretically be swapped out for US-oriented alternatives such as SEC EDGAR integrations or financial data APIs.

The four design principles underpinning the protocol each target a documented failure mode of LLMs performing open-ended research tasks. The first requires the agent to define a falsifiable research question with a measurable threshold before retrieval begins, preventing the model from defaulting to statistically probable but analytically hollow output. The second mandates that the agent commit to hypotheses and assign explicit prior probabilities before seeing any data, creating an anchor that makes belief updating traceable rather than post-hoc rationalization. The third establishes a three-tier claim classification system — critical, supporting, and background — that requires high-confidence assertions to be corroborated by genuinely independent sources, with "independent" defined by organizational and methodological separation rather than simple citation count. The fourth principle addresses confirmation bias at an architectural level by requiring a separate subagent with a clean context window to perform adversarial review, explicitly preventing the reasoning chains of the bull case from contaminating the challenge process.

The two MCP tools operationalize the protocol's data requirements in ways that conventional web search cannot. The ashare-mcp tool retrieves full historical financials — income statements, balance sheets, cash flows, and per-share metrics — in a single call, preventing the context window from being consumed by sequential year-by-year data fetching before any substantive analysis can begin. The NotebookLM MCP tool handles semantic retrieval over long proprietary documents such as annual reports and regulatory filings, allowing the agent to query uploaded documents in natural language and receive precise answers without loading entire documents into Claude's context. Together, these tools solve the two core data access problems the protocol depends on: longitudinal quantitative data and qualitative documentary evidence.

The framework reflects a broader trend in the AI development landscape toward structured agentic workflows that treat LLM failure modes as engineering constraints to be designed around rather than model flaws to be fixed. Projects like FinRobot have pursued similar layered architectures — separating data processing, concept extraction, and thesis synthesis into distinct Chain-of-Thought layers — but the approach described here is notable for its emphasis on epistemic discipline through protocol enforcement rather than model fine-tuning or architectural innovation alone. The use of subagent context isolation as a structural defense against confirmation bias is particularly significant, as it acknowledges that a single-context agent cannot reliably critique its own outputs after constructing them, a limitation that prompt engineering alone does not resolve.

This kind of practitioner-driven tooling development around Claude Code illustrates the growing ecosystem of domain-specific agent frameworks being built on top of frontier models. The explicit separation of protocol logic, data retrieval infrastructure, and model execution reflects a maturing understanding among sophisticated users that model capability is necessary but insufficient for reliable analytical work — governance of the reasoning process itself is equally consequential. As MCP tooling matures and more domain-specific data pipelines become available, similar frameworks are likely to emerge across other research-intensive industries where structured, verifiable reasoning chains are operationally essential.

Read original article →

Detailed Analysis

Don't Miss a Deploy