← Reddit

I built a Claude Code plugin to help me GM: TTRPG GM Apprentice. Looking for feedback on token efficiency.

Reddit · antthelimey_OG · May 2, 2026
A developer built gm-apprentice, a Claude Code plugin with eight skills to automate tabletop RPG campaign management through a markdown vault system, covering everything from session prep and live-table assistance to post-session processing and campaign auditing. The plugin comprises 33k lines of markdown organized around a shared reference layer and modular workflow skills, with several optimization techniques already employed including compaction passes, proportional content loading, and routing tables. The creator seeks guidance on further token efficiency improvements and best practices for reducing reference file sizes without compromising capability.

Detailed Analysis

A community developer operating under the handle AntTheLimey has released gm-apprentice, a free and open-source Claude Code plugin designed to automate the full lifecycle of tabletop RPG campaign management across systems including Call of Cthulhu, GURPS, Forged in the Dark, and D&D 5e. The plugin comprises eight discrete skills — ttrpg-expert, campaign-organizer, session-prep, session-play, session-wrapup, campaign-qa, vault-ingest, and publish-site — each scoped to a specific phase of the GM workflow, from between-session prep and at-the-table assistance to post-session canonization and public-facing site generation. All campaign state persists in a local markdown vault (Obsidian-compatible or plain filesystem), deliberately kept outside Claude's context window so that sessions remain resumable across any client — desktop, CLI, mobile, or IDE. The plugin is installable via the Claude Code marketplace and also supports Claude Desktop, VS Code, Cursor, and JetBrains, with manual zip-based installation available for free-tier users.

The architectural philosophy behind gm-apprentice reflects considered thinking about how to deploy a large language model as a practical workflow tool rather than a general-purpose assistant. The developer drew a hard line between advisory and operational roles: ttrpg-expert is read-only and never writes to the vault, while the remaining skills are workflow-driven executors. This separation allows independent compaction of the reference layer without affecting operational logic — a meaningful design constraint given that the full plugin spans approximately 33,000 lines of markdown. The session-play skill, optimized for live-table use, is deliberately compressed to around 80 lines and constrained to 1–5 sentence responses, a direct acknowledgment that latency and cognitive overhead at the table differ fundamentally from asynchronous prep work. The plugin itself was built through Claude Code, with the developer serving as domain expert, design decision-maker, and quality gate, while Claude generated code, reference files, CI pipelines, and a vault migration system — a model of human-AI collaborative software development that is becoming increasingly common.

The post's central technical concern — token efficiency at scale — reflects a challenge that is broadly relevant to anyone building complex, multi-skill agentic systems on top of Claude. The developer has already implemented several meaningful optimizations: periodic compaction passes that have achieved 30–60% size reductions on reference files, a shared/ directory housing common schemas to eliminate cross-skill duplication, proportional vault loading scaled to task complexity, and routing tables that allow targeted file access without full-context scanning. These are architecturally sound approaches, and they mirror patterns seen in enterprise RAG (retrieval-augmented generation) systems, where chunking strategy, metadata indexing, and selective retrieval are standard tools for managing context pressure. The open question the developer raises — whether there is a principled size threshold at which a skill should be split, and how to enforce truly lazy content loading — touches on an unsolved problem in prompt engineering: the lack of formal tooling or documented heuristics for context budget management in agentic plugin architectures.

The gm-apprentice project sits at the intersection of two broader trends in AI development. The first is the rapid maturation of Claude Code as a platform for building domain-specific agentic tools, with the plugin marketplace serving as a distribution layer that lowers the barrier for community developers to ship non-trivial applications. The second is the growing pattern of using AI systems to build AI-powered systems, sometimes called "recursive AI development" or "AI-assisted software engineering at the application layer" — the developer's explicit description of Claude as the primary coder, with human oversight focused on domain correctness and quality control, is a real-world case study in what Anthropic has positioned as a primary use case for Claude Code. The fact that a hobbyist GM with domain expertise but no mention of a traditional software engineering background was able to produce a cross-platform, CI-tested, marketplace-published plugin of this complexity in this way speaks to the current state of the capability frontier.

What makes gm-apprentice a noteworthy artifact beyond its niche utility is the degree to which it functions as a documented experiment in agentic plugin design. The developer has made explicit the tradeoffs — speed vs. depth, shared vs. duplicated context, advisor vs. executor roles — in a way that produces transferable insights for anyone building similarly structured multi-skill Claude plugins. The community feedback request around token efficiency is itself a signal that the ecosystem around Claude Code is developing the kind of practitioner knowledge base — informal, experience-driven, cross-pollinated through forums — that historically precedes the emergence of formal best practices. As Anthropic continues to expand Claude Code's capabilities and marketplace infrastructure, open-source projects like gm-apprentice are likely to serve as reference implementations and stress tests for the platform's architectural limits.

Read original article →