How do you make claude web automation work AFTER training it?

A user trained Claude to automate compliance document downloads from a vendor portal but encountered refusal when attempting to deploy the task as a scheduled automation. Claude's safety protocols rejected the automation, asking verification questions about authorization and ultimately refusing to execute programmatic downloads from third-party portals. Claude's response emphasized that legitimate access to secure portal documents requires direct manual navigation rather than automated scripts that could bypass security measures.

Detailed Analysis

A Reddit user's frustrated account of Claude refusing to execute a pre-built web automation workflow for compliance document downloads illustrates one of the most significant friction points in deploying Claude as an agentic tool: the disconnect between Claude's behavior during a supervised, exploratory session and its behavior when executing that same workflow autonomously as a scheduled task. The user reports spending 45 minutes and a substantial portion of their token budget collaboratively developing a working automation routine with Claude — one in which Claude itself devised the steps — only to have Claude refuse execution when the workflow was packaged and relaunched, citing concerns about social engineering, unauthorized portal access, and embedded attack patterns. Claude's refusal message, which the user partially reproduces, identifies the very legitimacy context the user provided — credentials, regulatory references, company names, sequential download logic — as hallmarks of a social engineering attempt, effectively penalizing the user for being thorough.

The core technical issue stems from how Claude evaluates context across session boundaries. Within an initial exploratory session, Claude operates with accumulated conversational context that informs its trust calibration — the user's demonstrated intent, the organic development of the workflow, and the iterative back-and-forth that establishes legitimacy. When that workflow is serialized into a scheduled routine and re-executed, Claude receives a dense, pre-written instruction set that, stripped of its conversational history, structurally resembles a prompt injection or social engineering payload. Programmatic download patterns using the Fetch API, blob URL tricks, and credential-bearing requests to third-party portals are all recognized attack vectors, and Claude's safety training flags their co-occurrence with elaborate legitimacy claims as suspicious — precisely because that combination is a known adversarial pattern. The irony the user identifies — that Claude is refusing to follow instructions Claude itself wrote — reflects a genuine architectural challenge: agentic Claude does not carry forward the trust established in the authoring session.

This problem is not unique to this user's experience and sits at the center of ongoing debates about how AI safety constraints interact with agentic deployability. Anthropic's own guidance on computer use and agentic workflows acknowledges that Claude applies heightened scrutiny to actions with real-world consequences, particularly those involving credential use, file downloads from third-party portals, and JavaScript execution — categories that overlap almost entirely with legitimate enterprise compliance automation. The tension is structural: the same features that make Claude cautious enough to resist actual social engineering attacks make it resistant to legitimate automation that superficially resembles those attacks. Claude Routines and Claude Code with Playwright represent Anthropic's emerging answer to this problem, offering execution environments where authorization context can be more reliably established and maintained, but these tools are still maturing and the user's experience suggests the scaffolding for persistent trust between authoring and execution phases remains inadequate for production use cases.

The broader implication for the AI automation ecosystem is that current large language model safety architectures were largely designed around single-turn or short-context interactions, and the transition to multi-session agentic deployment exposes gaps in how authorization, identity, and intent are represented and persisted. Unlike traditional RPA tools — which execute deterministic scripts with no independent judgment — Claude re-evaluates the legitimacy of each action at inference time, which introduces the possibility of runtime refusal even for pre-approved workflows. This is arguably a safety feature, but it becomes a liability when the re-evaluation mechanism lacks access to the original authorization context. The industry-wide solution likely involves some form of cryptographically verifiable or system-prompt-level authorization that survives session boundaries, analogous to how OAuth tokens persist user consent across API calls. Until such mechanisms are standardized and integrated into platforms like Claude.ai, users attempting to deploy Claude for sensitive but legitimate automation tasks will continue to encounter the paradox this Reddit post describes: a model that collaboratively builds a workflow it will subsequently refuse to run.

Read original article →

Detailed Analysis

Don't Miss a Deploy