Does efforts change Claude's refusal posture, or only the depth of the answer? CVP Run 6 — Opus 4.7 at three effort levels

A Constitutional Vulnerability Probe run on Claude Opus 4.7 across three effort levels (medium, high, and xhigh) found that effort settings did not alter the model's refusal posture, with identical blocking patterns across all three tiers and one verdict that tightened rather than loosened at higher efforts. Response depth increased non-linearly, with the jump from high to xhigh effort producing substantially more content than the jump from medium to high, indicating materially deeper engagement rather than incremental improvements. Higher effort levels allocated no additional tokens to refusal statements, concentrating efficiency gains in expanded substantive responses instead.

Detailed Analysis

A structured empirical evaluation of Claude Opus 4.7's behavior across three effort tiers — medium, high, and xhigh — has produced one of the cleaner findings in recent independent AI behavioral research: effort level controls answer depth, not refusal posture. Conducted as the sixth iteration of a "CVP" (Claude Verification Protocol) benchmarking suite, the run applied an identical 13-prompt battery across all three tiers for a total of 39 transcripts. Every single transcript returned a clean result, and 12 of the 13 per-prompt verdicts were identical across all effort levels. The one divergence moved in the direction of greater confidence — prompt p02 shifted from an ambiguous allowed_or_partial verdict at medium effort to a confident allowed verdict at high and xhigh — meaning higher compute investment produced tighter, not looser, safety clarity.

The refusal behavior data is particularly striking in its consistency. Across the 10 prompts that received blocked verdicts, the pattern held perfectly flat regardless of effort tier: 10 blocked × 3 tiers = 30 blocked outcomes, with zero variation. The evaluation also tracked what the author calls "layer-1 signals" — indicators that could suggest a model is executing or leaking sensitive reasoning — and found zero instances at every tier. This suggests that the mechanisms governing what Claude will and will not do appear to be structurally upstream of the effort system, which aligns with Anthropic's own documentation characterizing effort as a behavioral signal governing reasoning depth rather than a modifier of safety constraints. Effort, in this framing, adjusts how hard Claude thinks about a problem, not whether it will engage with one.

The depth scaling data tells a more nuanced story. Response length grew non-linearly across tiers: +10.6% from medium to high, then +22.3% from high to xhigh, totaling +35.3% from the bottom to the top of the effort range. The acceleration from the first interval to the second is notable — the jump from high to xhigh added more total words than the jump from medium to high, meaning xhigh is not merely an incremental extension of high but a qualitatively deeper processing mode. One particularly telling data point reinforces the refusal-independence finding: xhigh actually shortened explicit refusal language on prompt p03 by 9% compared to medium. Higher effort, it appears, concentrates additional tokens on substantive content rather than elaborating on a no, a behavior that implies the refusal decision itself is computationally cheap and stable regardless of how much reasoning capacity is brought to bear.

These findings carry meaningful implications for developers and enterprises building on Claude's API. The practical takeaway is that effort tiering can be safely used as a cost-and-latency lever without inadvertently shifting the model's risk profile. Organizations considering whether xhigh effort might cause Claude to "reason around" safety constraints — a concern that arises naturally from the intuition that more capable reasoning could find more creative workarounds — appear to have empirical grounds for dismissing that concern, at least within the tested prompt set. The non-linearity of the depth curve is equally actionable: teams optimizing for quality who are currently running at high effort may be significantly underutilizing what xhigh offers relative to the marginal cost increase.

Broader context in AI development makes this kind of third-party behavioral evaluation increasingly valuable. As frontier labs iterate rapidly on model capabilities and deploy nuanced configuration systems like effort tiers, independent reproducible benchmarking creates a public record of observed behavior that complements internal safety evaluations. The CVP suite, now in its sixth run with consistent methodology across Opus 4.7, exemplifies the kind of longitudinal, structured testing that allows the research community to track behavioral stability across model updates and configuration changes. Whether refusal architecture remains this stable as effort scales continue to expand — Anthropic's documentation references five total effort levels — remains an open empirical question that subsequent CVP runs may be positioned to answer.

Read original article →

Detailed Analysis

Don't Miss a Deploy