Tune Instructions, Workflows and Constraints
Tuning an agent is rarely about changing the model โ it is about tightening the instructions, the workflow steps, and the constraints around tool use. This topic walks through how to iterate on those levers safely using evaluators as a guard.
Tune Instructions, Workflows and Constraints
Most "we need a better model" instincts are misdiagnoses. The cheaper, safer levers are instructions, workflow steps, and constraints. Microsoft Foundry's agent model โ Model + Instructions + Tools โ makes the levers explicit, and version snapshotting makes iteration reversible.
The four levers, in order of cheap-to-expensive
| Lever | When to use it | Risk | | --- | --- | --- | | Instructions edit | Behaviour drift, missing reminders, formatting | Bloated prompt, contradictions | | Workflow / constraint change | Missing or mis-ordered steps | More rigid agent, less flexibility | | Tool config change | Wrong tool selected, bad schema, ambiguous description | Breaks existing flows that depended on the old behaviour | | Model swap | Reasoning failures the other levers cannot fix | Largest blast radius; revalidate everything |
The rule: change one lever at a time, re-run a fixed evaluation dataset, and keep the version snapshot so you can revert.
Try a tuning and see the outcome
Choose your own outcome
Your refund agent occasionally issues refunds without first verifying the customer owns the order. Evaluation shows the failure clusters in one mode. You have to pick a tune.
Which tune do you try first?
Quick check
Quick check
You change four things in the agent โ model, instructions, tool list, and workflow branch โ and re-run the eval. Score improves. What is wrong with this tune?
Where this shows up on the exam
GH-600 questions will hand you a failure cluster and four candidate tunes. The right answer is almost always the cheapest lever that actually addresses the failure mode โ and it is almost never "swap the model". When the failure is a missing step, the right answer is a workflow constraint, not a longer prompt.
Key terms
- Instructions
- The agent's prompt-based definition of goals, constraints, and behaviour. In Microsoft Foundry, prompt agents are defined almost entirely through instructions plus tool config.
- Workflow agent
- A declarative orchestration of multiple steps or agents โ built visually or in YAML โ that supports branching, human-in-the-loop, and group-chat patterns.
- Constraint
- An explicit rule the agent must follow: required tool, forbidden action, mandatory verification step, output schema, refusal condition.
- Tuning loop
- Iterative cycle of observe failures โ form a hypothesis โ change one lever (instruction, workflow step, constraint) โ re-evaluate against a fixed dataset โ keep or revert.
- Versioning
- Foundry snapshots agent versions automatically so you can roll back or A/B compare; tuning without versioning makes regressions irreversible.
Common pitfalls
- Changing several levers at once. When the score moves you cannot tell which change caused it.
- Tuning to the model output you saw last time instead of to a fixed evaluation dataset โ you optimise for noise.
- Adding more text to the instructions for every failure. Instructions become bloated, contradictory, and lower quality.
- Skipping versioning and snapshotting, so rolling back a bad tune means rewriting it from memory.