Structured Plan Output and Plan Validation
A 'plan' that is just a paragraph of text is not a plan β it is a hope. Learn the schema GH-600 expects for a structured plan, the validations you can apply to it, and the failure modes a structured plan unlocks.
Structured Plan Output and Plan Validation
A plan is a contract, not a paragraph. The exam expects you to recognise a well-formed plan, name the validations applied to it, and explain why structured plans unlock everything else (eval, traceability, replay).
The minimum plan schema
{
"plan_id": "abc-123",
"goal": "Triage and label issue #4821",
"steps": [
{
"id": "s1",
"tool": "github.issues.get",
"args": { "issue_number": 4821 },
"precondition": "issue exists and is open",
"postcondition": "issue body fetched into working memory"
},
{
"id": "s2",
"tool": "github.issues.addLabels",
"args": { "issue_number": 4821, "labels": ["bug", "priority:high"] },
"precondition": "s1 succeeded; classification = bug/high",
"postcondition": "labels visible on issue"
}
],
"budget": { "max_steps": 8, "max_tokens": 50000, "wall_seconds": 60 }
}
Notice what is not in the schema: free-form prose, self-rated confidence, hidden tools. Anything not in the schema is rejected at the boundary.
The validations the runtime applies
| Validation | What it catches |
| --- | --- |
| Schema validity | Malformed plans (missing fields, wrong types). |
| Tool allow-list | The agent picked a tool it does not have permission for. |
| Argument policy | E.g., force=true is never allowed; resource IDs are within scope. |
| Budget check | Plan exceeds step / token / time budget before it even starts. |
| Blast-radius estimate | High-blast-radius plans get escalated to human approval. |
Plan-vs-execution divergence
Because the plan is a contract, the runtime can detect when execution diverges. If step s3 tries to call a tool not listed in the plan, you don't tolerate it β you block and log it. This is the single most important reason to use structured plans: it converts "the agent went off-script" from a vibe into a programmatic signal.
Exam tip: any option that retroactively edits the plan during execution is wrong. The plan is the contract; the execution log is separate.
The step ledger
Each executed step writes a row to the ledger:
| Field | Purpose |
| --- | --- |
| step_id | Joins back to the plan. |
| started_at / finished_at | Latency + ordering. |
| outcome | success / validation_failed / tool_error / policy_blocked. |
| artifacts | Diffs, file paths, response payloads (for inspection). |
Together, the plan + ledger are the inspectable artifact a reviewer reads the next morning.
Quick check
Quick check
Which of the following is closest to the **minimum** structure GH-600 expects in a plan step?
Where this shows up on the exam
Expect questions asking you to pick the right plan schema, and questions phrased as "the agent did X that wasn't in the plan β what should the runtime do?". Always: validate, block divergence, log it. Never: retroactively edit, trust the agent's confidence, or apologise to the user.
Key terms
- Structured plan
- A typed, ordered list of steps where each step names the tool to call, the arguments, and the precondition / expected post-condition.
- Plan schema
- The JSON / type definition the runtime uses to validate that the agent produced a well-formed plan before any step executes.
- Plan validation
- The set of checks applied to a structured plan: schema-valid, only allow-listed tools, arguments within policy, total cost / blast radius under budget.
- Step ledger
- The execution log paired with the plan: each step records started_at, finished_at, outcome, and any artifacts produced.
Common pitfalls
- Storing the plan as free-form markdown so reviewers can read it 'naturally' β then having no programmatic way to validate or replay it.
- Letting the agent mutate the plan mid-execution without versioning. Now the post-mortem can't tell which plan was actually run.
- Forgetting the precondition/postcondition fields. Without them you cannot detect drift between expected and actual state.