Keep Claude working toward a goal - Claude Code Docs

The `/goal` command enables Claude to autonomously work toward a completion condition across multiple turns, with an evaluator checking after each turn whether the specified condition is met and directing Claude to continue working if not. Effective goals feature a measurable end state such as passing tests or cleaning up code, along with a stated verification method and relevant constraints, and can optionally include turn or time limits. The feature requires workspace trust settings and functions as a session-scoped shortcut that automatically clears when the condition is satisfied.

Detailed Analysis

Anthropic has introduced a `/goal` command in Claude Code that enables sustained, autonomous multi-turn task execution by allowing users to define a completion condition that a secondary model continuously evaluates after each working turn. Rather than requiring a human to prompt Claude at every step, the feature delegates the decision of "are we done yet?" to a lightweight evaluator model — defaulting to Claude Haiku — that reads the conversation transcript after each turn and returns a yes-or-no verdict along with a brief reasoning note. If the condition is not yet satisfied, Claude automatically initiates another working turn, continuing this loop until the evaluator confirms success, at which point the goal is marked as achieved and control returns to the user. The command accepts conditions of up to 4,000 characters, supports non-interactive execution via command-line flags, and persists across session resumes when work is interrupted.

The design choices embedded in `/goal` reveal deliberate thinking about reliability and scope separation. Anthropic explicitly distinguishes it from two related mechanisms: `/loop`, which restarts on a time interval rather than task completion, and Stop hooks, which are persistent, settings-file-level configurations that can invoke deterministic scripts. The `/goal` command occupies a middle ground — session-scoped and model-evaluated — making it more flexible than a hardcoded script but more principled than simply asking Claude to self-assess. The documentation's instruction to write conditions as things "Claude's own output can demonstrate" is a significant design constraint: the evaluator cannot independently read files or run commands, so the entire burden of evidence falls on what Claude surfaces in the conversation transcript. This forces users to write verifiable, output-grounded conditions rather than vague aspirational goals.

The practical use cases Anthropic highlights — migrating codebases to new APIs, working through issue backlogs, splitting large files until size budgets are met — all share a common structure: they are iterative, have measurable terminal states, and require more consecutive working time than a single prompt-response cycle can accommodate. This positions `/goal` squarely as a tool for agentic software engineering workflows, where the unit of work is not a question-and-answer exchange but a sustained campaign of code changes, test runs, and verification steps. The feature's compatibility with non-interactive mode and Remote Control further signals that Anthropic envisions Claude Code operating in automated pipelines — CI-adjacent workflows where a human engineer sets an objective and returns later to a completed result rather than supervising each step.

Taken in broader context, `/goal` reflects an industry-wide shift toward what might be called "outcome-directed" AI tooling, wherein large language models are increasingly expected not just to produce outputs on demand but to pursue defined objectives over extended, self-directed work sessions. The architectural choice to separate the working agent from the evaluating agent — using a fast, cheap model to judge completion rather than relying on the same model that performed the work — addresses a well-documented failure mode in agentic AI systems: self-serving or optimistic self-assessment. By introducing an independent evaluator, Anthropic builds in a structural check against Claude prematurely declaring success. This mirrors patterns emerging across agentic AI frameworks broadly, where multi-model architectures with distinct roles for execution, verification, and critique are becoming standard engineering practice rather than experimental novelty.

Read original article →

Detailed Analysis

Don't Miss a Deploy