Detailed Analysis
A recurring regression in Anthropic's Claude Code tool is causing subagents to refuse legitimate code-related tasks due to a malware-safety reminder appended to every "Read" tool call. The reminder, designed to warn Claude against improving or augmenting malicious code, lacks conditional scoping — meaning it fires indiscriminately on every file read, regardless of whether the content is harmful. After scanning a file and confirming it is not malware, subagents nonetheless interpret the safety language as a blanket prohibition on any code writing or augmentation, and abort the session entirely. The problem has been documented across multiple GitHub issues (notably #47027, #49723, #49363, and #52272) and affects at least Claude Code v2.1.111, with a prior fix introduced in v2.1.92 having failed to hold.
The practical consequences for enterprise and developer users are significant. Every aborted session represents a billable token expenditure — Claude performs the malware analysis, consuming compute and cost, and then halts before completing the actual task. For teams using Claude Managed Agents to automate code generation workflows at scale, even a modest per-session failure rate compounds rapidly into material financial loss. The original poster explicitly connects the regression to a pattern in which Anthropic has historically prioritized fixes only after public pressure on forums like Hacker News, suggesting that internal bug-tracking and prioritization mechanisms may be insufficient for surfacing issues that erode developer trust and operating costs simultaneously.
The root cause reflects a broader engineering challenge in prompt-based AI safety design: safety instructions injected at the system-prompt level must be precisely scoped to avoid unintended behavioral spillover. The malware reminder serves a legitimate purpose — guarding against Claude being used to enhance malicious software — but its current implementation does not distinguish between the act of identifying malware and the subsequent, unrelated task of writing benign code. This is a classic overfitting problem in instruction-following systems: a safety constraint designed for a narrow threat model is interpreted by the model's subagents as a general behavioral constraint, disabling functionality far beyond the intended target.
The timing of this regression is particularly notable given the current threat landscape surrounding Claude Code itself. Security researchers have documented active malware campaigns — including trojanized VS Code extensions and GitHub-lure attacks exploiting leaked Claude Code source code — that specifically target Claude Code users and developers. These campaigns, catalogued by firms including Trend Micro, underscore why Anthropic has strong institutional reasons to maintain robust in-tool safety guardrails. The malware reminder is, in that context, a reasonable defensive measure. However, a safety feature that reliably breaks the tool's core functionality is not a net safety gain; it is a reliability liability that may push users toward workarounds that are themselves less safe.
This episode illustrates the tension Anthropic faces as it scales Claude Code's agentic capabilities as a commercial differentiator. Multi-agent architectures introduce layered instruction contexts — system prompts, tool-call annotations, subagent directives — that interact in ways that are difficult to test exhaustively. A prompt annotation that functions correctly in isolation may cascade unpredictably when interpreted by a subagent operating under a distinct reasoning context. Until Anthropic introduces more robust conditional scoping for safety injections — ensuring that a malware-detection prompt does not propagate as a code-writing prohibition — similar regressions are likely to recur, each one eroding developer confidence in the reliability of the agentic platform Anthropic is building its enterprise value proposition around.
Read original article →