The Planning / Execution Boundary
The plan/execution boundary is the single most exam-relevant control in agent architecture. Learn where to place it, what validations belong on each side, and how to recognise scenarios where the boundary is missing.
The Planning / Execution Boundary
If you remember one control from Domain 1, make it this one. The plan/execution boundary is the explicit checkpoint between an agent producing a plan and the runtime acting on the outside world. Almost every exam scenario about safety, autonomy, or guardrails routes through here.
Why the boundary exists
Without it, you cannot:
- Audit the plan before it became reality.
- Block dangerous actions cheaply (instead of cleaning them up expensively).
- Compare the agent's intent against the agent's effect โ the foundation of every later evaluation.
Mantra: validate the plan, then execute. Never the other way around.
What goes on the plan side (pre-flight)
| Check | What it does | | --- | --- | | Schema validation | Does the plan conform to the typed contract? | | Policy check | Does the plan call only allow-listed tools, against allow-listed resources? | | Sandbox dry-run | Apply the plan in a throwaway environment first. | | Blast-radius estimate | What is the maximum damage if this plan misfires? | | Approval gate | For high-blast-radius actions: a human (or automated policy) signs off. |
What goes on the execution side (in-flight + post)
| Concern | Control | | --- | --- | | Tool failures | Classified retry policy, with an idempotency key. | | Partial success | Plan steps are individually committed and revertible. | | Visibility | Each tool call is logged with the trace ID from the plan. | | Drift | The runtime refuses to execute a plan that has been edited after approval. |
Decide the boundary
Choose your own outcome
An agent has just produced a plan to apply a database migration in production. You own the runtime. Decide how to handle the plan/execution boundary.
The plan is structured JSON and parses correctly. What do you do first?
Where this shows up on the exam
Several Domain 1 questions are answerable just by spotting the missing boundary. If the scenario describes an agent doing something irreversible with no pre-flight, the right answer is always "add a pre-flight validation / approval gate" โ never "make the model smarter" or "trust its confidence score".
Key terms
- Plan / execution boundary
- The explicit checkpoint between an agent producing a plan and the runtime taking any side-effecting action on the outside world.
- Pre-flight validation
- Checks that run on the *plan* before execution: schema validity, policy compliance, dry-run, cost estimate, blast-radius estimate.
- Blast radius
- The set of resources a tool call can affect if it goes wrong. Larger blast radius โ stricter boundary controls.
- Idempotency key
- A unique identifier sent with a tool call so the runtime can safely retry without duplicating side-effects.
Common pitfalls
- Placing the boundary 'after' execution and relying on revert โ revert is not a boundary control, it is a cleanup tool.
- Letting the agent's self-rated confidence gate execution. Self-rating is unauditable and falsely calibrated.
- Treating low-line-count diffs as low-risk. A one-line change to a feature flag or migration script can have maximum blast radius.