Skip to content
๐Ÿ”ฅ0
Sign in
10 min readhard+50 XP

Persist State and Detect Drift

Persisting agent state is half the job โ€” the other half is noticing when that persisted state has drifted from reality. This topic covers durable state stores, version stamps, and the failure signals that tell you the agent is acting on a stale snapshot.

After this topic, you'll be confident about Persisted state, State drift, Version stamp / ETag and 1 more concept.

Persist State and Detect Drift

Durable state lets an agent survive restarts, scaling events, and human handoffs. Drift detection is what keeps that durable state honest. Both are required โ€” persistence without drift detection just means the agent confidently acts on yesterday's world.

The persistence stack

| Layer | What you persist | Where | | --- | --- | --- | | Conversation thread | Messages, tool calls, plan | Managed thread store (e.g. Foundry threads, Cosmos DB) | | Task state | Hypotheses, intermediate artifacts | Task-scoped store keyed by taskId | | Cached external data | File contents, ticket bodies | Cache with version stamp (SHA, ETag, revision) | | Long-term memory | Preferences, conventions | Memory store keyed by scope |

The thing that turns a cache into a drift detector is the version stamp stored next to the value. Without it, you cannot tell stale from fresh.

The drift signals you must recognise

A well-instrumented agent listens for drift instead of pretending it can't happen.

| Signal | What it means | Right reaction | | --- | --- | --- | | 412 Precondition Failed / ETag mismatch | Source moved since you cached | Re-fetch, re-plan, retry | | 409 Conflict on write | Concurrent writer beat you to it | Merge or re-plan against new state | | Re-fetched value differs from cache | Cache invalidation was missed | Invalidate cache, log a drift event | | Tool returns unexpected schema | Source schema migrated | Stop, surface to a human |

Match each drift signal to its root cause

Match signals to failure modes

+50 XP

Drag each observed signal onto the root cause that best explains it.

Signals
Stale cached snapshot (version stamp moved)
Concurrent writer changed the resource
Upstream schema migrated
Cache invalidation event was missed
0 / 5 placed

Don't swallow drift signals

The tempting fix is to wrap every drift error in a retry. Resist. Each drift signal is operational information: it tells you the agent's state model is wrong. Log it, surface it, and only then re-plan. Silent retries hide the problem and let it compound.

Where this shows up on the exam

Expect a scenario where the agent "did the right thing but the world had moved". The correct answer always involves a version stamp on persisted state plus a named drift signal (ETag, 409, 412) wired into the agent's error path.

Anchor concepts

Key terms

Persisted state
Agent state written to a durable store (database, blob, conversation thread) so it survives process restarts, scaling events, and handoffs.
State drift
The gap between the agent's persisted view of the world and the actual current state of the system it acts on.
Version stamp / ETag
A monotonically changing identifier (commit SHA, file ETag, ticket revision) stored alongside cached state so you can detect when the source has moved.
Drift signal
An observable cue โ€” a 412/409 from a tool, an unexpected diff, a re-fetched value that disagrees with the cache โ€” that the agent's state is stale.
Watch out

Common pitfalls

  • Persisting state without a version stamp: you can't tell whether the cached file is current, and the agent applies edits to a stale base.
  • Treating a tool-call success as proof of freshness: a 200 from one tool doesn't mean a different system's cached state is still valid.
  • Catching drift signals and silently retrying: you lose the operational signal that the state model is wrong and just paper over the inconsistency.
Persist State and Detect Drift ยท Training