Weight 18%ยท7 topics
Evaluation, Error Analysis & Tuning
Define success signals, analyze failures by category, and tune instructions, workflows and tool usage.
- 1Define Success Criteria and SignalsBefore you can evaluate an agent you have to decide what 'good' looks like. This topic teaches you to translate a business goal into measurable success criteria and the concrete signals โ task completion, tool-call accuracy, groundedness, safety โ that prove the agent met them.โฑ 8 minยท+40 XPยทmedium
- 2Quantitative vs Qualitative SignalsHealthy agent programmes combine cheap quantitative signals that run on every build with slower qualitative signals that catch what numbers can't see. This topic shows how to pair them so neither side ships blind.โฑ 7 minยท+40 XPยทmedium
- 3Automated Scanning and Regression DetectionAutomated scanning is how you catch the bad change before users do. This topic covers running evaluators in CI/CD, gating releases on thresholds, scheduling production evaluation, and using continuous evaluation plus red teaming to detect regressions and drift.โฑ 9 minยท+45 XPยทmedium
- 4Failure Analysis from Traces and ArtifactsWhen an agent run goes wrong, the trace is your crime scene. This topic shows how to read a Foundry / OpenTelemetry-style trace, identify the failing span, and use the surrounding artifacts (instructions, tool args, retrieved context, model output) to assign root cause.โฑ 10 minยท+50 XPยทhard
- 5Classify Failure Modes: Reasoning, Tool, ContextA useful failure taxonomy turns a vague 'the agent broke' into an actionable fix. This topic teaches the three core agent failure modes โ reasoning, tool, and context โ and how to match observable signals to each so you fix the right layer.โฑ 9 minยท+50 XPยทhard
- 6Tune Instructions, Workflows and ConstraintsTuning an agent is rarely about changing the model โ it is about tightening the instructions, the workflow steps, and the constraints around tool use. This topic walks through how to iterate on those levers safely using evaluators as a guard.โฑ 10 minยท+50 XPยทmedium
- 7Refine Memory and Tool UsageOnce instructions and workflow are tight, the next levers are the agent's memory layer and its tool catalog. This topic shows how to refine what the agent remembers and which tools it can pick from โ using evaluator signals to decide what to add, prune, or scope.โฑ 9 minยท+45 XPยทmedium