Identify Actions That Require Human Judgment
Human-in-the-loop (HITL) is not free โ every gate adds latency and review burden. This topic teaches the four signals that an action genuinely needs human judgment (rather than just more evals or better prompts) so you can place gates where they earn their keep.
Identify Actions That Require Human Judgment
Not every risky action needs a human. Some need a stronger eval. Some need a smaller blast radius. The exam tests whether you can tell the difference โ and place HITL gates only where they will be used well.
Four signals that judgment is required
Add a HITL gate when any of these is true:
- Irreversible and externally visible โ sending an email, charging a card, publishing a release.
- Touches a regulated decision โ denying a loan, classifying a medical input, anything with an audit obligation.
- Trades off values the model cannot weigh โ picking which customer to apologise to first, choosing between two equally-correct refactors.
- Creates a new commitment to a third party โ signing a contract clause, scheduling a meeting on someone else's calendar, naming a partner in public copy.
If none of these is true, the answer is probably better evals or a smaller blast radius, not a human gate.
Workflow agents and HITL
Microsoft Foundry's workflow agents explicitly support human-in-the-loop steps as a first-class node in the graph. GitHub Copilot's coding agent uses the PR as its HITL gate โ the agent never merges directly; it always proposes a PR for a human to review. Both designs converge on the same idea: the gate is a real, named step with a real, named approver.
Exam tip: if a scenario says "the agent silently committed and pushed to main," the failure is not a missing eval. The failure is a missing HITL gate at the externally-visible boundary.
Pick the right gate
Choose your own outcome
Your team has built an agent that triages support tickets and proposes resolutions. A new request type has appeared: issue a refund up to $500. Where do you place the HITL gate?
What is your first move?
Where this shows up on the exam
GH-600 likes to give you a scenario where the agent makes the technically correct call that should have been a human's. The clue is always one of the four signals above. Naming the signal is half the answer.
Key terms
- Human-in-the-loop (HITL)
- A control pattern where the agent pauses and waits for an explicit human decision before continuing.
- Judgment-required action
- An action whose correctness depends on context, taste, or accountability that the model cannot supply โ e.g., approving a refund, naming a customer publicly, deciding which P0 to fix first.
- Approval gate
- The concrete UI or API mechanism (PR review, Slack approve button, signed token) that captures the human decision.
- Gate fatigue
- When too many low-value approvals cause humans to rubber-stamp, defeating the HITL guarantee.
Common pitfalls
- Adding a HITL gate to every action โ humans rubber-stamp and the guarantee evaporates. Gates need to be rare enough that they get genuine attention.
- Confusing HITL with manual review of logs after the fact. HITL is *blocking*: the action does not happen until a human says so.
- Picking the wrong human. The reviewer must have both the authority and the context to make the call; a generic on-call cannot meaningfully approve a legal disclosure.