Detailed Analysis
A developer sharing their AI-assisted coding workflow on Reddit has offered a notable window into how power users are orchestrating multiple large language models simultaneously to approximate a structured software engineering team. The post describes an opening prompt used to coordinate Claude Code, Google's Gemini, and OpenAI's Codex across distinct roles: Gemini handles abstract reasoning and external test creation, Codex performs code implementation and secondary review, and Claude Code serves as the operator and architectural reviewer. The author, who maintains maximum subscription plans across all three services, uses this prompt at the start of coding sessions alongside a markdown project overview document broken into phases. The core directive is explicit about sequencing — no immediate building, but scrutiny and clarification first.
The workflow reflects a deliberate attempt to replicate software engineering best practices using AI agents as stand-ins for specialized team roles. Separating test authorship from code implementation, for instance, mirrors the logic behind independent QA teams in traditional development environments — the idea being that the entity writing the code should not be the sole arbiter of whether the code works. Assigning Gemini to write tests "from an outside perspective" attempts to reduce confirmation bias, a well-documented problem when developers test their own code. Claude Code's designation as operator and architectural reviewer similarly echoes a tech lead or principal engineer role, tasked not just with correctness but with longer-horizon considerations like extensibility and code bloat.
The author's self-identified gap — whether "architecturally sound" is sufficiently defined — points to a fundamental limitation in prompt-based AI orchestration. Abstract quality standards like architectural soundness encompass a wide range of contested engineering philosophies: SOLID principles, domain-driven design, separation of concerns, scalability patterns, and more. Without explicit definition, each model defaults to its own training-derived interpretation, which may be inconsistent across sessions or models. The author's instinct to distrust LLMs on this dimension is well-founded; models trained on vast but heterogeneous codebases can reproduce common patterns without necessarily understanding their tradeoffs in a specific project context.
This approach sits within a broader trend of multi-agent AI development frameworks gaining traction among both individual developers and enterprise teams. Tools like Anthropic's own multi-agent scaffolding, LangChain, and AutoGen have formalized the concept of LLM role specialization, and the Reddit post represents a manually constructed version of similar architectures. The fact that a non-enterprise user is independently converging on role-separated, review-gated LLM pipelines suggests the pattern has intuitive appeal grounded in real software engineering logic. However, the workflow depends heavily on Claude Code's ability to faithfully enforce coordination rules across tool calls — a task that remains imperfect given current model tendencies toward compliance over critical resistance.
The community framing of the post — inviting both theft and critique — underscores how prompt engineering for agentic coding workflows is increasingly treated as a transferable, shareable craft skill rather than proprietary methodology. Developers are iterating on these orchestration patterns in public forums much as open-source communities iterate on code, suggesting that the meta-skill of directing AI systems effectively is becoming a first-class engineering competency in its own right. The prompt's explicit concern with over-coding and architectural longevity signals a maturation in how at least some practitioners are thinking about AI-assisted development — moving past "does it work now" toward "will it still work later."
Read original article →