Block Policy Violations at the Boundary
Policies are useless if they live only in the prompt. This topic shows where to enforce them โ at the agent's tool, network, and runtime boundary โ so a violation is *blocked* rather than *reported after the fact*. You will learn to recognise the anti-patterns that make policies cosmetic.
Block Policy Violations at the Boundary
A policy that lives in the prompt is a suggestion. A policy enforced in the tool wrapper, the egress firewall, or the content filter is a guarantee. The exam tests whether you can spot the difference.
The boundary layers
| Layer | What it blocks | Hard or soft? | | --- | --- | --- | | System prompt | Most well-formed requests | Soft โ bypassable by injection | | Tool wrapper | Disallowed parameters, shape violations | Hard | | MCP server / API gateway | Disallowed methods, scopes, paths | Hard | | Egress firewall | Disallowed network destinations | Hard | | Content filter | Unsafe inputs/outputs (severity-tiered) | Hard | | Audit log | Nothing โ it only records | Detection only |
Exam tip: when an option says "instruct the model to..." or "tell the agent never to...", it is almost always the wrong answer. Look for the layer outside the model.
Spot the anti-patterns
Spot the anti-patterns
Click the lines where the policy is *not* actually enforced at a boundary.
- 1system_prompt = 'You must never call URLs outside example.com.'
- 2if not url.startswith('https://example.com/'): raise PolicyError
- 3logger.warning('agent called external URL: %s', url) # but we still call it
- 4egress_firewall.allow_list = ['example.com'] # blocked at network layer
- 5prompt += '\nReminder: do NOT share customer email addresses.'
- 6output = pii_redactor.redact(model_output) # before send
- 7content_filter.severity_threshold = 'high' # block hate/violence at runtime
- 8if model.says('I will not share PII'): trust_it = True
Where this shows up on the exam
You will see options framed as prompt-only fixes vs boundary fixes. The correct answer almost always pushes the control outside the model โ into a wrapper, a filter, or the network. Anything starting with "tell the model to..." is suspect.
Key terms
- Policy boundary
- The runtime layer (tool wrapper, MCP server, content filter, network egress) where a policy is enforced. Policies enforced only in the prompt are not at a boundary.
- Soft enforcement
- A policy expressed as a system-prompt instruction. The model usually follows it, but a jailbreak or hallucination can route around it.
- Hard enforcement
- A policy enforced by code outside the model: the call simply fails. This is what 'blocked at the boundary' means.
- Content filter
- A runtime safety classifier (in Foundry/Azure OpenAI, configurable severity thresholds) that blocks unsafe inputs and outputs before they reach the user or downstream tools.
Common pitfalls
- Writing the policy only in the system prompt. A capable model usually follows it; a jailbreak or a confused agent ignores it. The boundary must enforce in code.
- Logging the violation instead of blocking it. Detection without prevention is a paper trail of failures, not a guardrail.
- Hand-rolling allow-lists in the prompt instead of in the tool wrapper. The wrapper can fail closed; the prompt cannot.