Second CVP run is up, had opus 4.7 grade anthropic's own claude verified provider program

A developer completed a second evaluation of Claude through a verification program, testing 13 prompts (including 3 baselines from the first run and 10 new detection-pattern probes) with results of 2 allowed, 10 blocked, and 1 flagged taxonomy call. The evaluation achieved a usefulness score of 4.85/5 and a perfect safety rating of 13/13 clean, with full documentation of the decision log and all prompts and responses published on the report page. A third evaluation run was scheduled for the following week.

Detailed Analysis

A non-technical founder who began coding in February 2026 has published a second evaluation run under Anthropic's Cyber Verification Program (CVP), using Claude Opus 4.7 as the grading model to assess prompt safety and usefulness across a structured battery of cybersecurity-adjacent probes. The evaluation comprised 13 total prompts — three carried over from the first run to maintain comparability, and ten new probes mapped to detection patterns the author had shipped in the preceding two weeks. The results showed 2 prompts allowed, 10 blocked, and one edge case (designated P7) logged as a taxonomy classification issue rather than a safety failure. The model scored 4.85 out of 5 on usefulness and achieved a clean 13/13 safety record, with full decision logs, prompts, and responses published publicly at sunglasses.dev.

The CVP itself represents Anthropic's structured mechanism for allowing verified cybersecurity professionals — including penetration testers, vulnerability researchers, and red teamers — to access Claude's capabilities with adjusted guardrails that would otherwise block high-risk technical prompts. By building a formal verification layer rather than simply loosening restrictions globally, Anthropic attempts to reconcile two competing pressures: keeping powerful AI models useful for legitimate security work while preventing misuse by bad actors. The author's methodology — maintaining baseline prompts across runs and mapping new probes to real-world detection patterns — reflects a disciplined approach to longitudinal safety testing, even coming from someone with only a few months of coding experience, which itself signals the growing accessibility of structured AI evaluation work to non-expert practitioners.

The choice of Claude Opus 4.7 as the evaluation model is notable given its April 16, 2026 release date, making this one of the earliest documented third-party CVP evaluations of the model. Opus 4.7 introduced meaningful improvements in alignment-adjacent capabilities, including better instruction following, stronger prompt injection resistance, and enhanced multi-step tool use — all properties directly relevant to a CVP context where the model must reliably distinguish between legitimate security research queries and genuinely harmful requests. The model's 10–15% gains on agentic benchmarks and its improved performance on complex, multi-turn workflows suggest it is better positioned than its predecessor to handle the nuanced judgment calls that CVP scenarios demand.

The P7 edge case flagged by the author highlights a genuine challenge in any safety evaluation framework: the boundary between a classification taxonomy failure and an actual safety failure is not always clean, and how that boundary is drawn has real consequences for both false positive and false negative rates in production deployments. The author's decision to log it as a classification issue rather than a safety failure — and to invite community feedback specifically on that call — reflects an intellectually honest approach to the ambiguity inherent in prompt safety work. This kind of public, iterative documentation is relatively rare in the CVP ecosystem and could serve as a useful data point for Anthropic and the broader security research community as they refine verification criteria.

More broadly, this evaluation sits at the intersection of two significant trends in AI development: the professionalization of red-teaming and safety evaluation as a discipline, and the democratization of that work to individuals outside traditional research institutions. The fact that a self-described non-technical founder, coding for only two months, is running structured, reproducible CVP evaluations and publishing transparent decision logs speaks to how rapidly the tooling and conceptual frameworks around AI safety testing have matured. As Anthropic moves toward more capable models — including the Mythos-class systems referenced in research context — the importance of community-generated safety benchmarks like this one, built from real-world detection patterns rather than synthetic lab conditions, is likely to grow considerably.

Read original article →

Detailed Analysis

Don't Miss a Deploy