9 min readmedium+45 XP

Refine Memory and Tool Usage

Once instructions and workflow are tight, the next levers are the agent's memory layer and its tool catalog. This topic shows how to refine what the agent remembers and which tools it can pick from — using evaluator signals to decide what to add, prune, or scope.

After this topic, you'll be confident about Memory layer, Tool catalog, Tool-call accuracy and 2 more concepts.

Refine Memory and Tool Usage

After instructions and workflow, the next two levers worth pulling are the memory layer and the tool catalog. Both are easy to expand thoughtlessly and both quietly degrade agent quality when they are. The refinement discipline is to add little, scope tight, and let evaluator signals tell you when to prune.

Refining the tool catalog

Microsoft Foundry's tool catalog includes built-ins (web search, file search, memory, code interpreter), custom functions, and MCP servers; a Toolbox lets you curate a single MCP-compatible endpoint of approved tools per agent. The signals that drive refinement:

Signal	Refinement
Tool-call accuracy drops after adding tools	Scope: hide tools the agent doesn't need, or split into a smaller per-agent Toolbox
Agent often picks the wrong tool from two similar ones	Improve tool descriptions and parameter names; consider merging
Tool returns errors the agent ignores	Document expected errors in the tool description; add handling in instructions
Tool succeeds with wrong args	Tighten the schema; add input validation in the wrapper

Rule of thumb: every tool the agent never uses on a real task is a tool that lowers selection accuracy on the ones it does use.

Refining the memory layer

Memory turns into a context-failure machine when nobody owns it. Memory hygiene rules:

Write less: only persist facts that have to outlive the conversation.
Update on write: if a fact has a canonical owner (account record, profile), refresh from source on read instead of caching forever.
Expire on time: every memory entry has a TTL, even if the TTL is "until the next session".
Never persist secrets: tokens, credentials, and PII that the agent saw in passing should not live in durable memory.
Prefer authoritative source: when memory disagrees with a system of record, the system of record wins.

When a failure is "agent used a stale fact", it is almost always a memory hygiene problem, not a model problem.

Scope is a quality lever, not only a security lever

Granting a tool the full surface ("the whole org", "any file path") inflates the agent's choice space and the blast radius of mistakes. Scoping down to the minimum surface needed for the task improves tool-call accuracy and limits damage when something goes wrong — which is why Foundry exposes scope as a first-class configuration on connected tools and MCP servers.

Quick check

1 of 3

+45 XP

Tool-call accuracy drops 10 points after a release. The change log shows the team added 5 new MCP tools to the catalog. What is the most likely cause?

Pick your answer.

Where this shows up on the exam

Two recurring shapes: (1) tool-call accuracy drops after a catalog expansion — the answer is to scope down or curate a Toolbox; and (2) the agent answers using a stale value — the answer is memory hygiene (expiry, write-through, source-of-truth precedence), not a model swap.

Anchor concepts

Key terms

Memory layer: Storage the agent uses across turns or sessions — short-term scratchpad, conversation history, durable user/profile memory — separate from the prompt.
Tool catalog: The set of tools available to an agent. In Foundry, tools include built-ins, custom functions, and MCP servers; a Toolbox curates a single MCP-compatible endpoint of approved tools.
Tool-call accuracy: Foundry agent-specific evaluator that measures whether the agent selected the right tool with the right arguments for a given task.
Scope: The subset of a tool's surface (specific actions, repos, paths) that an agent is permitted to invoke. Reducing scope reduces the agent's search space and the blast radius of a mistake.
Memory hygiene: Discipline of writing only what is needed, expiring stale entries, and never persisting secrets — so memory does not become a source of subtle context failures.

Watch out

Common pitfalls

Adding more tools to 'help' the agent. A larger catalog raises selection ambiguity and lowers tool-call accuracy.
Writing free-form chat history to durable memory. Old chatter becomes the wrong context tomorrow and triggers context failures.
Never expiring memory. The agent answers today using a fact that changed last week.
Granting a tool full scope (e.g., a whole org's repos) when the task only ever touches one repo. Mistakes get bigger.