9 min readmedium+40 XP

Document Handoffs, Decisions and Outcomes

When control passes between agents — or from an agent to a human — the receiving party needs three things: what was decided, why, and what the next expected outcome is. Learn the artifacts and structures that make handoffs auditable.

After this topic, you'll be confident about Handoff, Decision record, Outcome log and 1 more concept.

Document Handoffs, Decisions and Outcomes

A multi-agent system is only as auditable as its handoffs. When control moves from planner to executor, from executor to reviewer, or from agent to human, three artifacts have to travel with it: what was decided, why, and — once the action runs — what actually happened.

The three artifacts

| Artifact | What it captures | Where it lives | | --- | --- | --- | | Handoff payload | The plan, the constraints, rejected alternatives, a trace ID | A structured JSON/YAML object passed to the next step | | Decision record | The choice, the rationale, the actor, the timestamp | A persistent store (PR description, ADR file, workflow log) | | Outcome log | The post-action observation: did it work? metrics? errors? | A queryable log keyed by trace ID |

GitHub Copilot's coding agent uses a PR as the canonical handoff: the plan is in the PR description, the decision rationale is in the commit messages and PR comments, the outcome is in the CI run and the post-merge metrics. The structure is intentional — every artifact has a durable URL.

Why the rationale matters more than the answer

Two patterns of multi-agent failure trace back to under-documented handoffs:

The receiving agent re-litigates the same decision because it doesn't know what alternatives were rejected and why.
The reviewer rubber-stamps because the PR shows only the final diff, not the path taken.

A decision record with rejected alternatives prevents both. Think of it as an Architecture Decision Record (ADR) at the granularity of a single agent step.

Closing the loop with outcome logs

The canonical agent loop ends in observe → reflect. Without an outcome log, "observe" is unverifiable. Every action an agent takes should produce a record of the form:

trace_id: 9f1c…
agent: incident-responder/v2
action: restart-service(api-prod)
expected: 503 rate < 1%
observed: 503 rate = 0.4% after 90s
status: success

This record is what lets a downstream analytics or eval pipeline answer "did this fix actually fix anything?" — the question that turns a one-shot tool into a learning system.

Exam tip: When a scenario asks how to make a handoff "reviewable" or "auditable", look for an answer that names all three artifacts (handoff payload + decision record + outcome log) and links them by a trace ID.

Quick check

1 of 3

+40 XP

An agent finishes a planning phase and hands off to an execution agent. What's the minimum payload the handoff should carry?

Pick your answer.

Where this shows up on the exam

GH-600 tends to ask about handoffs indirectly — through audit, debugging, or post-incident review scenarios. The answer is almost always the option that preserves rationale and links artifacts by a stable identifier, not the one that maximises information density inside a single message.

Anchor concepts

Key terms

Handoff: An explicit transfer of control and context from one agent (or workflow step) to another, recorded as a durable artifact.
Decision record: A short, structured note capturing the choice made, the alternatives considered, the rationale, and the agent or human responsible.
Outcome log: A persisted record of what actually happened after an action — used to close the loop on reflect/observe and to compute success rates.
Trace ID: A stable identifier that links every model call, tool invocation, and decision in a run so a reviewer can reconstruct the sequence later.

Watch out

Common pitfalls

Passing only the final answer between agents — the receiver loses the reasoning and the rejected alternatives, then re-litigates them.
Storing decisions only in chat history that rotates — by the time someone audits, the rationale is gone.
Treating the PR description as the entire decision record, with no link to traces, eval results, or the rejected branches.
Logging the action but not the outcome, so you can't tell whether the agent actually solved the problem or just claimed it did.