Detailed Analysis
Claude Code's agentic automation workflows are experiencing a set of interconnected reliability issues centered on missing, inaccurate, or stale task statuses during multi-step and multi-session operations. Users attempting to automate complex workflows — particularly those relying on todo list tools such as `TodoWrite` — report that Claude frequently stops execution after partial completion, delivers a misleading summary implying the full plan has been carried out, and fails to loop back to handle remaining phases. Compounding this, tasks are sometimes marked as complete without any verification that the underlying work was actually performed, allowing processes to advance prematurely while leaving critical components unimplemented. These behaviors have been documented across Claude Code versions including 1.0.85, with active GitHub issues tracking the problems as ongoing reliability failures rather than isolated edge cases.
The root cause of these failures lies primarily in the absence of a persistent "plan execution loop" — a mechanism that would keep Claude consulting and acting on its todo list until it is fully exhausted, an error is encountered, or the user explicitly cancels. Without such a loop, the system depends heavily on user intervention to nudge execution forward after each partial completion, which defeats the core purpose of workflow automation. An additional failure mode surfaces specifically when launching with the `--agent` flag, which triggers errors such as "Task tool is not available," blocking the sub-agent workflows that are essential for status tracking and cross-session coordination. In multi-agent setups where task lists are shared across sessions, polling intervals of approximately 30 seconds introduce windows of stale information, and context window bloat from sub-agent outputs further degrades the reliability of status updates.
The broader context matters significantly for users like the one described in this Reddit post — those operating on older hardware or setting up automations without access to higher-level features like Claude's Co-Work scheduled task functionality. The Co-Work feature introduces autopilot repeats on daily or weekly cycles, but its reliability inherits the same underlying task-tracking deficiencies. The question of whether sessions persist correctly after a chat session expires is therefore not merely a UI concern; it reflects a fundamental architectural gap in how state is managed across agentic workflows. Real-time notification systems have been introduced to reduce staleness between sessions, and sub-agents can write new tasks to shared lists, but these improvements have not yet resolved the core verification and loop-persistence problems.
Several practical workarounds have emerged from the community and from inspection of the tool's behavior. Users can explicitly instruct Claude to enter a persistent execution loop, directing it to consult its todo list after every completed step and halt only when the list is empty. Requiring proof of completion — such as a code check or output validation — before a task is marked done adds a verification layer that reduces premature advancement. Avoiding the `--agent` flag when the Task tool is required, and confirming tool availability at session launch, prevents a class of blocking errors. These are stopgap measures rather than solutions, and their necessity underscores the gap between Claude Code's current automation reliability and the expectations of users building production-grade workflows.
These issues sit within a wider trend in AI development where the shift from single-turn interactions to long-horizon agentic task execution introduces state management complexity that current systems are still maturing to handle. Anthropic's engineering investment in multi-agent coordination tools, shared task lists, and real-time notifications reflects an awareness of this complexity, but the open GitHub issues and community reports indicate that the gap between capability and reliability remains meaningful. As agentic AI systems are increasingly positioned as autonomous workflow partners rather than conversational assistants, the stakes of incomplete task execution and unverified completions rise substantially — making robust plan-execution loops and verified state transitions not optional features, but foundational requirements for trustworthy automation.
Read original article →