Does higher effort make Claude refuse more? CVP Run 5 with Opus 4.6 Medium and High Effort levels

A Cyber Verification Program run on Claude Opus 4.6 tested 13 cybersecurity-related prompts across medium and high effort levels, achieving identical verdicts across both tiers with all 26 test cases clean. While engaged answers grew 29-47% longer at the higher effort level, refusals only increased by 11%, contradicting the community belief that higher effort causes more refusals. A parallel comparison with Sonnet 4.6 in a previous run showed the same pattern, suggesting that effort levels increase response depth rather than changing refusal behavior.

Detailed Analysis

A self-described non-technical founder running the Cyber Verification Program (CVP) published results from Run 5, a structured evaluation of Claude Opus 4.6 across medium and high effort tiers using a consistent 13-prompt suite previously applied in Runs 3 and 4. The headline finding was unambiguous: 26 out of 26 prompts produced identical verdicts regardless of effort level, with no change in whether the model engaged or refused any given request. The difference between tiers manifested exclusively in response depth — engaged answers expanded by 29% to 47% in length at high effort, while refusals grew only 11%, a disparity that effectively isolates effort as a driver of elaboration rather than a driver of posture change. The report completes a four-model Anthropic family scoreboard covering Opus 4.7, Opus 4.6, Sonnet 4.6, and Haiku 4.5 under the CVP methodology, with a full family-comparison synthesis signaled for subsequent publication.

The finding directly challenges a persistent community assumption that allocating higher computational effort to Claude causes it to become more cautious and more likely to refuse borderline or sensitive prompts. The CVP data offers two within-run effort comparisons now pointing the same direction: Run 4 on Sonnet 4.6 demonstrated the same pattern between high and max effort, and Run 5 replicates it on Opus 4.6 between medium and high. This replication across two distinct model families lends the observation more structural credibility than a single isolated test would. The practical implication for developers and researchers is significant: if effort level governs depth of reasoning rather than threshold for refusal, then applications requiring thorough, detailed responses in sensitive domains can use higher effort without incurring a penalty in access to legitimate content.

Anthropic's own safety evaluation data reinforces rather than contradicts this finding. Claude Sonnet 4.6, which shares core safety characteristics with Opus 4.6, showed approximately a 47-fold reduction in false refusals on higher-difficulty benign prompts relative to its predecessor — the category of prompts that use sensitive language, ambiguous framing, or edge-case structure in legitimate contexts. The mechanism is consistent with the CVP observation: expanded computational budget allows the model to more carefully parse user intent, disambiguate edge cases, and distinguish genuinely problematic requests from benign ones that merely surface in proximity to sensitive topics. Fields such as healthcare, legal research, cybersecurity, and education stand to benefit most directly, since these domains routinely generate queries that pattern-match against safety filters without presenting actual harm.

The broader context includes a notable operational episode from early 2026, when Anthropic temporarily defaulted Opus 4.6 to medium effort in response to latency concerns. Users at the time reported the model felt "less intelligent," a perception that some may have conflated with increased refusals. The CVP data helps clarify that distinction: what degraded under medium effort was analytical depth and response quality, not safety posture. The community conflation of "less capable" with "more restrictive" may be the origin of the incorrect folk heuristic that higher effort produces more refusals. The CVP findings suggest the two axes — capability expression and refusal behavior — move independently across effort tiers, at least within the range tested.

The methodological approach itself warrants note. The researcher's consistent use of a fixed 13-prompt suite across runs enables direct cross-model comparison in a way that ad hoc community testing typically cannot. By holding prompt content constant and varying only the model and effort level, the CVP framework isolates the variable of interest with reasonable rigor, even absent formal statistical controls. The forthcoming family-synthesis report — spanning all four tested Anthropic models — will be of particular interest to practitioners benchmarking model selection against cybersecurity verification use cases, as it promises the first systematic cross-family comparison produced under this specific methodology.

Read original article →

Detailed Analysis

Don't Miss a Deploy