I wrote 74 Claude Code skills. Most were theater. Here are the 3 that actually changed what the agent does.

An author who created 74 Claude Code skills found that most produced no discernible change in agent behavior, discovering that skills teaching knowledge the model already possessed were ineffective. The three types of skills that actually improved performance were those that removed choices rather than adding knowledge, established prohibitions on default behaviors, and triggered at precisely the right moments. Constraining what the agent could do proved substantially more valuable than attempting to teach it additional information.

Detailed Analysis

A developer's retrospective on building 74 custom skills for Claude Code has surfaced a counterintuitive but practically significant insight: the overwhelming majority of skill-writing effort produces no measurable change in agent behavior, and the minority that does work operates through constraint rather than instruction. The post, shared to r/ClaudeAI, distills months of experimentation into a single diagnostic question — whether a given skill removes a choice the model keeps getting wrong, as opposed to adding knowledge the model already possesses. The author ultimately deleted several skills covering "best practices" and observed no difference in output quality, confirming that wrapping known information in structured prompts generates token overhead without behavioral change.

The three categories of skills the author identifies as genuinely effective share a common architecture: they narrow the decision space available to the agent rather than expanding its knowledge base. A release pipeline skill that designates one canonical script and explicitly forbids all alternative paths prevented version drift across multiple files — not because Claude lacked understanding of package publishing, but because Claude's default disposition toward helpful improvisation led it to apply reasonable-looking solutions in multiple locations simultaneously. Similarly, explicit prohibitions ("never run npm publish directly," "never commit this file") proved more valuable than pages of positive guidance. This maps to a known challenge in AI agent design: capable models frequently fail not from ignorance but from over-eager pattern completion in contexts where restraint is the correct behavior.

The author's third insight — that trigger precision outweighs skill prose quality — points to a systems-level concern that is often underweighted in discussions of prompt engineering. A skill that loads at the wrong moment or fails to load at the right one is effectively absent regardless of its internal quality. The observation that stale skills are worse than no skills at all adds a maintenance dimension that complicates the common assumption that more structured guidance is always additive. When a skill describes a workflow that has since changed, it introduces authoritative misinformation into the agent's context, and the agent's tendency to trust structured instructions over ambient signals means it may act on the outdated description in preference to observable reality.

This experience reflects a broader tension in AI agent tooling between the expressive flexibility that makes large language models powerful and the operational reliability that production workflows require. The most capable base models are, by design, generalists inclined toward completing tasks through whatever path appears reasonable given available context. That flexibility is the source of their utility and the source of the failure mode the author describes. Custom skill layers that function primarily as constraint mechanisms — defining the edges of acceptable behavior rather than the content of correct behavior — effectively transform a general-purpose reasoner into something closer to a bounded specialist for specific workflow contexts. The author's instinct that "constrain, don't teach" may be the whole game is consistent with how guardrails and policy enforcement tend to add value in agentic systems where the base model's competence is already high.

The post also implicitly raises questions about the scalability of skill-based customization as a design pattern. With 74 skills accumulated over several months, the author is already spending significant maintenance effort correcting or deleting skills that have drifted from current reality. As Claude Code and similar agentic development tools gain adoption, the lifecycle management of custom instruction sets — versioning, deprecation, conflict resolution between overlapping skills — is likely to become a meaningful operational challenge. The author's current preference for many small, single-purpose skills over fewer large ones suggests a modularity instinct that parallels software engineering conventions, but it also multiplies the number of discrete artifacts requiring maintenance, making the stale-skill problem proportionally more acute at scale.

Read original article →

Detailed Analysis

Don't Miss a Deploy