9 min readmedium+45 XP

The Planning / Execution Boundary

The plan/execution boundary is the single most exam-relevant control in agent architecture. Learn where to place it, what validations belong on each side, and how to recognise scenarios where the boundary is missing.

After this topic, you'll be confident about Plan / execution boundary, Pre-flight validation, Blast radius and 1 more concept.

The Planning / Execution Boundary

If you remember one control from Domain 1, make it this one. The plan/execution boundary is the explicit checkpoint between an agent producing a plan and the runtime acting on the outside world. Almost every exam scenario about safety, autonomy, or guardrails routes through here.

Why the boundary exists

Without it, you cannot:

Audit the plan before it became reality.
Block dangerous actions cheaply (instead of cleaning them up expensively).
Compare the agent's intent against the agent's effect — the foundation of every later evaluation.

Mantra: validate the plan, then execute. Never the other way around.

What goes on the plan side (pre-flight)

| Check | What it does | | --- | --- | | Schema validation | Does the plan conform to the typed contract? | | Policy check | Does the plan call only allow-listed tools, against allow-listed resources? | | Sandbox dry-run | Apply the plan in a throwaway environment first. | | Blast-radius estimate | What is the maximum damage if this plan misfires? | | Approval gate | For high-blast-radius actions: a human (or automated policy) signs off. |

What goes on the execution side (in-flight + post)

| Concern | Control | | --- | --- | | Tool failures | Classified retry policy, with an idempotency key. | | Partial success | Plan steps are individually committed and revertible. | | Visibility | Each tool call is logged with the trace ID from the plan. | | Drift | The runtime refuses to execute a plan that has been edited after approval. |

Decide the boundary

Choose your own outcome

+45 XP

An agent has just produced a plan to apply a database migration in production. You own the runtime. Decide how to handle the plan/execution boundary.

The plan is structured JSON and parses correctly. What do you do first?

Where this shows up on the exam

Several Domain 1 questions are answerable just by spotting the missing boundary. If the scenario describes an agent doing something irreversible with no pre-flight, the right answer is always "add a pre-flight validation / approval gate" — never "make the model smarter" or "trust its confidence score".

Anchor concepts

Key terms

Plan / execution boundary: The explicit checkpoint between an agent producing a plan and the runtime taking any side-effecting action on the outside world.
Pre-flight validation: Checks that run on the *plan* before execution: schema validity, policy compliance, dry-run, cost estimate, blast-radius estimate.
Blast radius: The set of resources a tool call can affect if it goes wrong. Larger blast radius → stricter boundary controls.
Idempotency key: A unique identifier sent with a tool call so the runtime can safely retry without duplicating side-effects.

Watch out

Common pitfalls

Placing the boundary 'after' execution and relying on revert — revert is not a boundary control, it is a cleanup tool.
Letting the agent's self-rated confidence gate execution. Self-rating is unauditable and falsely calibrated.
Treating low-line-count diffs as low-risk. A one-line change to a feature flag or migration script can have maximum blast radius.