A medicine student with no coding experience tried to create a studying agent: Felicity.

A medicine student without coding experience developed a personalized studying agent called Felicity using Claude Opus 4.7 for optimization and Sonnet 4.6 with Adaptive-Thinking for agent operation within Co-Work. The student segmented the prompt into separate sub-agents, which improved system stability but left uncertainty about whether fundamental problems were being solved or merely masked with temporary fixes. The creator requested help identifying and addressing internal system problems.

Detailed Analysis

A medicine student with no prior coding experience documented their attempt to build a personalized AI studying agent called Felicity using Anthropic's Claude ecosystem, sharing both their architectural choices and their frustrations in a candid Reddit post on r/ClaudeAI. The student structured the project around Claude's Co-Work platform, which Anthropic designed specifically to make agentic workflows accessible to non-terminal-native users — a product category that directly targets people in exactly this situation. Their model selection strategy was deliberate if intuitive: Claude Opus 4.7 for high-level workflow design and optimization tasks, and Claude Sonnet 4.6 with Adaptive Thinking enabled for the actual runtime agent interactions. The project evolved from a single large prompt into a segmented multi-agent architecture distributed across Co-Work's sub-agent framework.

The student's central concern — that they are "duct taping irrelevant pieces together rather than solving problems" — reflects a well-documented challenge in agentic AI development that even experienced engineers encounter. When a system is built incrementally through prompt engineering rather than through a coherent architectural plan, surface-level stability can mask structural incoherence. Each sub-agent may function in isolation while the system as a whole lacks principled data flow, context handoff, or error propagation logic. The student's instinct that something is wrong despite apparent functionality is, in this sense, diagnostically sound. The absence of coding experience compounds this: without the ability to inspect logs, trace execution paths, or write unit tests, identifying where internal failures originate becomes substantially harder.

The model selection approach the student describes is a reasonable heuristic but not without risk. Using a more powerful, slower model like Opus for architectural decisions and a faster, more cost-efficient model like Sonnet for runtime is a recognized pattern in tiered agentic design. However, the Adaptive Thinking toggle on Sonnet 4.6 — which enables extended reasoning chains — introduces latency and token cost considerations that may affect performance in a studying context where responsiveness matters. More critically, if the sub-agents were designed around Opus's reasoning depth and then deployed with Sonnet, there may be capability mismatches that explain some of the instability the student is observing. The right model is not just a matter of speed versus power, but of whether the model's reasoning profile matches the task it is being asked to perform.

This post sits at the intersection of two significant trends in AI development. First, Anthropic's deliberate push to democratize agentic tooling through Co-Work represents a bet that domain experts — doctors, lawyers, researchers — will become capable AI system builders without needing to become software engineers. The Felicity project is a real-world test of that thesis, and the student's mixed experience highlights that accessibility of tooling does not eliminate the conceptual complexity of agentic system design. Second, the challenge of debugging prompt-based multi-agent systems is an unsolved problem across the industry. Without native observability tools — execution traces, inter-agent message logs, confidence scoring — users building in environments like Co-Work are largely flying blind, which explains the student's sense of navigating without instruments.

The broader significance of this post is its honest articulation of the gap between what AI agent platforms promise and what non-technical builders can currently achieve independently. The student built something functional enough to deploy and use, which is itself meaningful, but they correctly sense that intuition and iteration alone are insufficient for building reliable systems. As Anthropic and competitors continue pushing agentic tools toward general audiences, the tooling gap — specifically the absence of accessible debugging, interpretability, and testing infrastructure for non-coders — is likely to become one of the most consequential design problems in applied AI product development.

Read original article →

Detailed Analysis

Don't Miss a Deploy