Safe Execution: Retries, Rollbacks and Escalation
Tool calls fail. Networks blip. Tests go red. Safe execution is the discipline of deciding โ before you ship โ when to retry, when to roll back and when to stop and ask a human. Learn the policies that keep autonomous agents inside their lane.
Safe Execution: Retries, Rollbacks and Escalation
Tools fail. Sometimes the network blips, sometimes the API is genuinely broken, sometimes the agent's plan is wrong. Safe execution is the policy that says, in advance, what happens for each class of failure โ so the agent's behaviour at 3am is the same as the behaviour the team approved at 3pm.
The three policies you must declare
| Policy | What it answers | Concrete example | | --- | --- | --- | | Retry | When and how many times do we try again? | Idempotent GET: up to 3 with exponential backoff. POST: at most 1 with verification. | | Rollback | If we committed a side-effect we regret, how do we undo it? | Auto-open a revert PR; restore previous deploy artifact. | | Escalation | When do we stop trying and ask a human? | Loop budget exhausted, guardrail denied, no plan progress for 2 reflect cycles. |
Each policy should be machine-readable. An "ask the on-call engineer" runbook is not a policy; an agent-callable escalate(reason, context) tool is.
Walk a failing scenario
Choose your own outcome
Your agent is on iteration 4 of fixing a flaky test. The `merge_pull_request` tool just returned a 502 error from GitHub. Test status is green and the branch is protected. Walk the next move.
The merge tool returned 502. What is the immediate next step?
Where this shows up on the exam
GH-600 returns to a small set of trigger words: "retry", "rollback", "escalate", "loop", "idempotent". When you see them, anchor on the rule: retries only for idempotent operations, rollbacks must be automated and reviewable, escalation triggers must be observable not self-reported. Avoid every option that hides the failure.
Key terms
- Idempotent retry
- A retry policy that is only applied to operations whose repeated execution has the same effect as a single execution (GETs, PUTs on a stable key).
- Rollback plan
- The pre-declared, automated way to undo the side-effect a tool call produced (revert PR, restore snapshot, redeploy previous artifact).
- Escalation
- Handing control back to a human when the agent has hit its retry/loop budget, confidence threshold or a guardrail trigger.
- Loop budget
- A hard ceiling on the number of plan/act/observe cycles or total tool calls per task โ the cheapest defence against runaway agents.
Common pitfalls
- Retrying a non-idempotent write (a POST that creates an issue) on every error โ you ship N duplicate issues and look incompetent.
- Treating 'rollback' as a manual runbook to be read at 2am instead of an automated, agent-callable tool.
- No loop budget โ the agent retries forever, racks up a cost bill and never escalates to the human who could have unblocked it in 30 seconds.