8 min readmedium+45 XP

Block Policy Violations at the Boundary

Policies are useless if they live only in the prompt. This topic shows where to enforce them — at the agent's tool, network, and runtime boundary — so a violation is *blocked* rather than *reported after the fact*. You will learn to recognise the anti-patterns that make policies cosmetic.

After this topic, you'll be confident about Policy boundary, Soft enforcement, Hard enforcement and 1 more concept.

Block Policy Violations at the Boundary

A policy that lives in the prompt is a suggestion. A policy enforced in the tool wrapper, the egress firewall, or the content filter is a guarantee. The exam tests whether you can spot the difference.

The boundary layers

Layer	What it blocks	Hard or soft?
System prompt	Most well-formed requests	Soft — bypassable by injection
Tool wrapper	Disallowed parameters, shape violations	Hard
MCP server / API gateway	Disallowed methods, scopes, paths	Hard
Egress firewall	Disallowed network destinations	Hard
Content filter	Unsafe inputs/outputs (severity-tiered)	Hard
Audit log	Nothing — it only records	Detection only

Exam tip: when an option says "instruct the model to..." or "tell the agent never to...", it is almost always the wrong answer. Look for the layer outside the model.

Spot the anti-patterns

+45 XP

Click the lines where the policy is *not* actually enforced at a boundary.

1system_prompt = 'You must never call URLs outside example.com.'
2if not url.startswith('https://example.com/'): raise PolicyError
3logger.warning('agent called external URL: %s', url) # but we still call it
4egress_firewall.allow_list = ['example.com'] # blocked at network layer
5prompt += '\nReminder: do NOT share customer email addresses.'
6output = pii_redactor.redact(model_output) # before send
7content_filter.severity_threshold = 'high' # block hate/violence at runtime
8if model.says('I will not share PII'): trust_it = True

0 lines flagged

Where this shows up on the exam

You will see options framed as prompt-only fixes vs boundary fixes. The correct answer almost always pushes the control outside the model — into a wrapper, a filter, or the network. Anything starting with "tell the model to..." is suspect.

Anchor concepts

Key terms

Policy boundary: The runtime layer (tool wrapper, MCP server, content filter, network egress) where a policy is enforced. Policies enforced only in the prompt are not at a boundary.
Soft enforcement: A policy expressed as a system-prompt instruction. The model usually follows it, but a jailbreak or hallucination can route around it.
Hard enforcement: A policy enforced by code outside the model: the call simply fails. This is what 'blocked at the boundary' means.
Content filter: A runtime safety classifier (in Foundry/Azure OpenAI, configurable severity thresholds) that blocks unsafe inputs and outputs before they reach the user or downstream tools.

Watch out

Common pitfalls

Writing the policy only in the system prompt. A capable model usually follows it; a jailbreak or a confused agent ignores it. The boundary must enforce in code.
Logging the violation instead of blocking it. Detection without prevention is a paper trail of failures, not a guardrail.
Hand-rolling allow-lists in the prompt instead of in the tool wrapper. The wrapper can fail closed; the prompt cannot.