Skip to content
πŸ”₯0
Sign in
8 min readmedium+40 XP

Structured Plan Output and Plan Validation

A 'plan' that is just a paragraph of text is not a plan β€” it is a hope. Learn the schema GH-600 expects for a structured plan, the validations you can apply to it, and the failure modes a structured plan unlocks.

After this topic, you'll be confident about Structured plan, Plan schema, Plan validation and 1 more concept.

Structured Plan Output and Plan Validation

A plan is a contract, not a paragraph. The exam expects you to recognise a well-formed plan, name the validations applied to it, and explain why structured plans unlock everything else (eval, traceability, replay).

The minimum plan schema

{
  "plan_id": "abc-123",
  "goal": "Triage and label issue #4821",
  "steps": [
    {
      "id": "s1",
      "tool": "github.issues.get",
      "args": { "issue_number": 4821 },
      "precondition": "issue exists and is open",
      "postcondition": "issue body fetched into working memory"
    },
    {
      "id": "s2",
      "tool": "github.issues.addLabels",
      "args": { "issue_number": 4821, "labels": ["bug", "priority:high"] },
      "precondition": "s1 succeeded; classification = bug/high",
      "postcondition": "labels visible on issue"
    }
  ],
  "budget": { "max_steps": 8, "max_tokens": 50000, "wall_seconds": 60 }
}

Notice what is not in the schema: free-form prose, self-rated confidence, hidden tools. Anything not in the schema is rejected at the boundary.

The validations the runtime applies

| Validation | What it catches | | --- | --- | | Schema validity | Malformed plans (missing fields, wrong types). | | Tool allow-list | The agent picked a tool it does not have permission for. | | Argument policy | E.g., force=true is never allowed; resource IDs are within scope. | | Budget check | Plan exceeds step / token / time budget before it even starts. | | Blast-radius estimate | High-blast-radius plans get escalated to human approval. |

Plan-vs-execution divergence

Because the plan is a contract, the runtime can detect when execution diverges. If step s3 tries to call a tool not listed in the plan, you don't tolerate it β€” you block and log it. This is the single most important reason to use structured plans: it converts "the agent went off-script" from a vibe into a programmatic signal.

Exam tip: any option that retroactively edits the plan during execution is wrong. The plan is the contract; the execution log is separate.

The step ledger

Each executed step writes a row to the ledger:

| Field | Purpose | | --- | --- | | step_id | Joins back to the plan. | | started_at / finished_at | Latency + ordering. | | outcome | success / validation_failed / tool_error / policy_blocked. | | artifacts | Diffs, file paths, response payloads (for inspection). |

Together, the plan + ledger are the inspectable artifact a reviewer reads the next morning.

Quick check

Quick check

1 of 3
+40 XP

Which of the following is closest to the **minimum** structure GH-600 expects in a plan step?

Pick your answer.

Where this shows up on the exam

Expect questions asking you to pick the right plan schema, and questions phrased as "the agent did X that wasn't in the plan β€” what should the runtime do?". Always: validate, block divergence, log it. Never: retroactively edit, trust the agent's confidence, or apologise to the user.

Anchor concepts

Key terms

Structured plan
A typed, ordered list of steps where each step names the tool to call, the arguments, and the precondition / expected post-condition.
Plan schema
The JSON / type definition the runtime uses to validate that the agent produced a well-formed plan before any step executes.
Plan validation
The set of checks applied to a structured plan: schema-valid, only allow-listed tools, arguments within policy, total cost / blast radius under budget.
Step ledger
The execution log paired with the plan: each step records started_at, finished_at, outcome, and any artifacts produced.
Watch out

Common pitfalls

  • Storing the plan as free-form markdown so reviewers can read it 'naturally' β€” then having no programmatic way to validate or replay it.
  • Letting the agent mutate the plan mid-execution without versioning. Now the post-mortem can't tell which plan was actually run.
  • Forgetting the precondition/postcondition fields. Without them you cannot detect drift between expected and actual state.
Structured Plan Output and Plan Validation Β· Training