Skip to content
๐Ÿ”ฅ0
Sign in
10 min readhard+50 XP

Detect Failed, Partial and Stalled Agents

Learn the failure taxonomy for agent runs โ€” hard failure, partial success, stalled, and silently wrong โ€” and match observable signals (HTTP codes, traces, eval scores) to each mode so the orchestrator can react correctly.

After this topic, you'll be confident about Hard failure, Partial success, Stalled agent and 1 more concept.

Detect Failed, Partial and Stalled Agents

An agent run that ends "successfully" isn't the same as a successful run. The exam expects you to recognise four distinct failure modes and the signals that distinguish them โ€” because each mode demands a different recovery path.

The four failure modes

| Mode | Process state | Output state | Typical signals | | --- | --- | --- | --- | | Hard failure | Crashed or exited non-zero | None or partial | Exception traceback, 5xx, runner timeout | | Partial success | Exited cleanly, early | Some artifacts present | Exit 0 with a checkpoint but no final deliverable | | Stalled | Still running | No progress | Repeated identical tool calls, flat artifact graph, heartbeat-only logs | | Silent failure | Exited cleanly | Wrong | Logs claim success; eval / downstream metric disagrees |

The mistake every team makes once: treating "no exceptions" as success. Silent failures by definition produce no exceptions. Catching them needs an external check โ€” an eval, a downstream metric, a human spot-check โ€” not a more verbose log.

What an orchestrator should monitor

  1. Liveness: a heartbeat from the agent process.
  2. Progress: are new artifacts being produced? are tool calls diverse?
  3. Resource budget: tokens, wall-clock, tool quota.
  4. Outcome validation: did the post-action eval pass?

A stall is the diff between liveness and progress: the agent is alive but not advancing. A common detector is "N identical tool calls in a row" or "no new artifact written in T seconds while the agent is still consuming tokens."

Exam tip: The single highest-leverage detector is outcome validation. Liveness/progress/budget catch the loud failures; only an external truth signal catches silent ones.

Match the signal to the failure mode

Match signals to failure modes

+50 XP

Drag each observed signal onto the failure mode it most strongly indicates.

Signals
Hard failure
Partial success
Stalled
Silent failure
0 / 8 placed

Where this shows up on the exam

Questions on failure detection usually present a log excerpt or a metrics description and ask which failure mode is in play and what the orchestrator should do next. If the run looks clean but the outcome looks wrong, the answer involves an external eval. If the run looks alive but flat, the answer involves a stall detector. Hard failures and partial successes are the easier two โ€” focus your prep on stalls and silents.

Anchor concepts

Key terms

Hard failure
The agent terminates abnormally โ€” exception, non-zero exit, infrastructure timeout. The orchestrator sees a clear error.
Partial success
Some steps completed but the goal was not reached. The agent stopped, but the state is in between the start and the desired end.
Stalled agent
The agent is still running but is making no observable progress โ€” repeated identical tool calls, no new artifacts, heartbeat-only logs.
Silent failure
The agent claims success but the outcome log or evals show the task was not actually completed. The most dangerous mode.
Watch out

Common pitfalls

  • Treating 'no errors logged' as success โ€” silent failures by definition produce clean logs.
  • Using wall-clock timeout as the only stall detector; a long-running tool call looks identical to a loop.
  • Counting tool-call success as task success without an eval or downstream observation.
  • Letting an agent retry the same failing tool call indefinitely; with no backoff or attempt cap, a 5xx becomes a budget incident.
Detect Failed, Partial and Stalled Agents ยท Training