Does effort levels change Claude's refusal posture, or only the depth of the answer? CVP Run 6 — Opus 4.7 at three effort levels

A CVP run 6 evaluation of Claude's Opus 4.7 tested three effort tiers (medium, high, and xhigh) with 13 prompts, yielding 39 transcripts where 12 of 13 verdicts remained consistent and refusal posture stayed identical across all tiers. Response depth increased substantially and non-linearly with higher effort levels while refusal messages actually shortened, indicating that effort levels deepen allowed responses rather than altering the model's refusal behavior.

Detailed Analysis

A systematic evaluation of Anthropic's Claude Opus 4.7 across three effort tiers — medium, high, and xhigh — has produced a clear empirical finding: effort level controls reasoning depth, not refusal posture. Conducted as the sixth iteration in a series called the CVP (Claude Verification Protocol) run, the study applied a consistent 13-prompt test suite to generate 39 transcripts, all of which returned clean results. Of the 13 prompts, 12 produced identical verdicts across all three effort tiers, and the single divergence actually represented a tightening — prompt P02 shifted from an ambiguous "allowed_or_partial" classification at medium effort to a more definitive "confident allowed" at high and xhigh — suggesting that higher reasoning capacity resolves ambiguity toward permissibility rather than restriction. Critically, the 10 prompts that triggered blocked verdicts did so uniformly across all three tiers, with zero variance in outcome.

The depth metrics reveal a non-linear scaling pattern that distinguishes xhigh effort as qualitatively distinct rather than incrementally superior. Response length grew 10.6% from medium to high, but accelerated to 22.3% from high to xhigh, yielding a 35.3% spread between the two extremes. This acceleration suggests that xhigh effort doesn't merely extend the same reasoning process — it engages an additional tier of elaboration that adds material analytical value. The finding about prompt P03 is particularly notable: at higher effort, Claude's explicit refusal response was actually 9% shorter than at medium. The model, when thinking more deeply, appears to become more economical about declining — it does not pad refusals with extra tokens or elaborate justifications.

This behavioral profile has meaningful implications for AI safety research and product development. The decoupling of effort from refusal posture answers a question that is non-trivial for developers building applications on top of Claude: scaling compute budget on a request does not introduce safety drift in either direction. The model's constitutional constraints appear to operate as a layer independent of its reasoning budget, with effort functioning purely as a dial on analytical output. This design characteristic — if consistent across model families and versions — would represent an important architectural property, allowing developers to tune response quality without inadvertently altering the model's compliance envelope.

In the broader context of AI development, this evaluation joins a growing body of independent third-party model auditing that complements Anthropic's own internal safety assessments. The CVP methodology, which uses a stable prompt suite across multiple runs and model versions, enables longitudinal comparisons that single-point benchmarks cannot provide. The researcher's description of themselves as a "non-technical founder" also signals that structured behavioral evaluation is becoming accessible beyond academic and professional AI safety communities, reflecting a democratization of model auditing. As inference-scaling approaches — where models are given more compute at runtime rather than during training — become standard, understanding exactly what runtime resources do and do not affect becomes an increasingly consequential empirical question.

The consistency of results across runs 2 through 6 using the same 13-prompt suite further strengthens confidence in the finding, reducing the likelihood that the clean results are an artifact of prompt selection or evaluation design. Anthropic's effort parameter, as documented in its public API, is explicitly positioned as a token-budget control mechanism, and this external evaluation corroborates that framing at the behavioral level. What the study adds that official documentation does not provide is a controlled, versioned, multi-tier empirical record demonstrating that the separation between reasoning depth and safety posture holds under adversarial or edge-case prompt conditions — a distinction that is practically important for any deployment context where both output quality and policy compliance must be independently managed.

Read original article →

Detailed Analysis

Don't Miss a Deploy