Tell HN: Opus 4.6/4.7 cyber policy changes break authorized bug bounty workflows

Anthropic's updated cyber usage filters in Claude Opus 4.6/4.7 are blocking authorized bug bounty work despite program scope and authorization language being present in the model's context window. The API-level filters reject tasks within authorized scope, overriding the model's own reasoning that the work constitutes legitimate defensive research. The remediation path through the Cyber Verification Program reportedly favors researchers with public CVEs or conference talks, excluding early-career researchers who have received bug bounty payouts but lack public disclosure history.

Detailed Analysis

Anthropic's release of Claude Opus 4.7 introduced API-level cybersecurity filters that are disrupting legitimate, authorized bug bounty research workflows — a friction point that has drawn sharp criticism from the security community on Hacker News. The filters, which were also retroactively applied to Opus 4.6, operate as classifiers that intercept requests flagged as potentially violative of Anthropic's cyber usage policy, regardless of the contextual authorization language present in the session. In one documented case, the model explicitly acknowledged that a task constituted "authorized research under the [Redacted] Bounty program" before being blocked by the very next API call — demonstrating a fundamental disconnect between the model's own in-context reasoning and the upstream classifier's enforcement logic. Affected researchers report disruptions to active submissions on major platforms like HackerOne, Bugcrowd, and Immunefi, with tasks as routine as drafting vulnerability reports, performing analysis, and refining proof-of-concept code being caught in the filter.

The technical architecture behind these changes stems from Anthropic's internal Project Glasswing, an initiative to manage dual-use cybersecurity risks associated with its most capable models. Opus 4.7 functions as a testbed for these safety controls — it deliberately ships with reduced cyber capabilities compared to Claude Mythos Preview, while incorporating stricter enforcement mechanisms. Opus 4.6 had already introduced six new cybersecurity probes and demonstrated superior investigative performance across 38 of 40 security benchmarks. The 4.7 update builds on this foundation with API-level classifiers designed to detect and block "prohibited or high-risk cybersecurity uses," a category whose precise boundaries Anthropic has not publicly disclosed, leaving researchers unable to anticipate or plan around filter triggers mid-session.

The remediation pathway Anthropic has offered — a Cyber Verification Program colloquially referred to as "the guild" — raises significant access and equity concerns within the security research community. Qualification appears to favor researchers with established public credentials: CVEs, conference presentations, or documented public disclosures. This effectively excludes a large population of early-career and independent researchers who have legitimate, paid track records on major bounty platforms but lack the public footprint that the verification program recognizes. The irony is pointed: AI-assisted research tools are arguably most valuable to researchers who lack institutional resources, yet the exception process is calibrated toward those who need it least. The author and others in the HN thread have proposed concrete remediation steps, including weighting in-context authorization language more heavily in classifier decisions, accepting platform payout histories as verification evidence, and publishing the specific task categories subject to restriction so researchers can adjust their workflows proactively.

This episode reflects a broader and increasingly consequential tension in frontier AI development: the challenge of deploying powerful, general-purpose models in dual-use domains without either over-restricting legitimate professional use or enabling genuine harm. Anthropic's approach here — deploying blunt, context-agnostic classifiers as a safety layer — prioritizes harm prevention at the cost of usability for authorized actors. Competitors like OpenAI and Google DeepMind face structurally identical tradeoffs as their models grow more capable in cybersecurity contexts, and the industry has yet to converge on a satisfying technical or policy architecture for distinguishing authorized from malicious use at inference time. Context-aware authorization, cryptographically verifiable scope attestations, or tiered API access tied to platform-verified credentials represent possible future directions, but none are currently implemented at scale by any major AI provider.

The practical stakes are real for Anthropic's commercial position in the security research market. Claude has accumulated significant adoption among professional bug bounty hunters and penetration testers, and abrupt mid-workflow breakage — particularly for paying Max subscribers — risks accelerating migration to alternative tooling. More broadly, the incident illustrates that safety policy changes to production AI systems can function as de facto product changes with immediate downstream consequences for professional users, yet are often communicated through model release notes rather than through the kind of advance notice and migration paths that developers and researchers have come to expect from critical infrastructure providers. How Anthropic responds to this feedback — whether by refining classifier granularity, expanding the verification program's evidentiary standards, or improving transparency around filter categories — will likely serve as a signal to the security community about whether Claude remains a viable professional tool in this domain.

Read original article →

Detailed Analysis

Don't Miss a Deploy