Blog · June 12, 2026 · XSI-AIMS Atlas™

Why does XSI-AIMS™ have eight kinds of memory?

It is the first question every architect asks about the XSI-AIMS memory model — and the answer runs from a 180× latency spread in the published literature to the difference between agents you can debug and agents you can only re-run.

The question

Why does XSI-AIMS have eight kinds of memory? Every architect asks it within minutes of opening the memory section of the spec, because one store has always felt like enough — the RAG pipeline has a vector database (similarity-ranked retrieval over documents), the chat product has a session buffer, and both ship. So why does XSI-AIMS, an open standard for agent governance, define seven core memory subsystems plus an optional spatial profile — eight distinct contracts in all?

Because the stores are not interchangeable, and the published numbers say so. In the MemGraphRAG team's KDD 2026 evaluation (arXiv:2606.00610), their graph store retrieved answers to multi-hop questions in 0.061 seconds per query — while LightRAG, run on the same benchmarks, took 11.052 seconds, with HippoRAG between them at 1.586. That is a roughly 180× spread across published memory architectures doing nominally the same job. Memory architecture is not an implementation detail: it is the difference between an agent that answers and an agent that stalls.

The eight subsystems

Eight contracts. Not one database.

XSI-AIMS v2.0 recognizes seven core memory subsystems plus an optional-profile eighth, because reliability, recall, consistency, and cost needs differ by subsystem. Here is what each one remembers, and what failure looks like when it lands in the wrong store.

The Procedural Memory Subsystem (PMS) remembers how the agent does what it does: recipes, skills, the workflow corpus. Procedures need reliability above everything — when the agent reaches for a skill, the skill must load, every time. Put procedures in a similarity-ranked document store and you get the failure every agent operator recognizes: the skill that fires only when the embedding math happens to favor it. A skill that loads most of the time is not a skill — it is a gamble that usually pays.

The Knowledge Retrieval Subsystem (KRS) remembers what the agent knows about: facts, documents, references. Knowledge is the one subsystem where ranked retrieval is the right contract — finding the most relevant fact among millions tolerates approximation. The failure mode runs the other way: knowledge stuffed into session context, paid for on every call, aging with no consolidation path. This is also the crowded end of the research field (graph stores, KG-path retrievers, temporal trees), and the variance among published graph-RAG retrievers — MemGraphRAG, LightRAG, HippoRAG (all measured in arXiv:2606.00610) — is that 180× latency spread.

Working and Episodic are the session split. Earlier drafts carried one undivided session store — v2.0 retires it and splits it in two (RFC-0021), because what is happening now and what happened are different contracts. The Working Memory Subsystem holds per-invocation scratch: the working set, read-your-own-writes in milliseconds, and by design never indexed across sessions. Mis-store it in a persistent vector index and yesterday's scratch surfaces in today's call, ranked by similarity when what mattered was recency. The Episodic Memory Subsystem remembers what happened — a raw event tier and a consolidated tier, append-only, time-indexed, never overwritten in place. An audit trail that can be edited is not an audit trail, and for governance-grade agents the episodic store is the one a regulator eventually reads.

The Entity Memory Subsystem (EMS) remembers entity facts over time — bi-temporal, entity-anchored, queryable as-of any date. The contract is temporal honesty: what did the agent know about this customer on March 3, not what does it know now. Fold entity facts into the episodic log and that question becomes a full-history scan — fold them into the knowledge index and the answer silently becomes the present tense.

Prospective and Reflective are the v2.0 additions, both core (RFC-0021, Accepted June 3, 2026). The Prospective Memory Subsystem holds future-directed intentions — triggers, reminders, commitments bound to a principal, a time, or a condition. Stored anywhere else, an intention is a sentence in a transcript: it fires only if retrieval happens to surface it at the right moment, and a reminder that depends on similarity search is a reminder that does not fire. The Reflective Memory Subsystem holds lessons learned — and what keeps it from being a diary is the exit path: lessons promote into procedure through a governed induction gate, not by an agent quietly rewriting its own skills. Leave lessons in the episodic log and the agent repeats the mistake. Skip the gate and every bad day edits how the agent works.

The Spatial Memory Subsystem is the optional-profile eighth, for embodied and mapped domains: locality, paths, regions. Its queries are topological — adjacency, reachability, containment — and topology does not embed. Flatten a route into vectors and "what is near the loading dock" gets answered by whatever sounds like the loading dock.

The bus

Same query, same stores — every time.

One name spans all eight without being a store: TMS, the Trust and Memory Provenance Subsystem. It is the bus — trust assignments, provenance records, and the promotion and consolidation contract that moves raw episodes into consolidated ones and gated lessons into procedure. Backends are swappable, per subsystem, declared at registration. The bus is not swappable. That asymmetry is deliberate, because the bus is where determinism lives: the XSI-AIMS memory contract guarantees that the same query, from the same agent, under the same declaration, routes to the same stores — every time — and the guarantee holds because routing, promotion, and provenance run through one governed contract rather than eight private ones.

Determinism reads like a small property until an incident lands. An agent misbehaves at 02:00 — was the failure in the model, the prompt, or what the agent remembered? If memory retrieval is nondeterministic, you cannot replay the incident, only re-run it and hope the same memories surface. And the evidence says memory errors do not correct themselves: in Yonsei University's harness-optimizer evaluation (Shor, arXiv:2605.22505), 94.4% of non-prompt harness errors (the memory, tool, and workflow class) persisted into the final harness. Errors in memory compound silently, and a deterministic routing layer is what makes them findable.

Structure pays on capability, not just operations. A University of Wisconsin–Madison study (arXiv:2604.13151) measured what happens when an agent's working state is externalized into structured harness state instead of raw context history: task success moved from 51.9% to 88.9% on Gemini-3.1-Flash-Lite and from 63.0% to 92.6% on GPT-4.1 — a 30 to 37 percentage-point gain, with no weight changes. Their result, not ours — and direct support for treating memory as a structured, governed surface rather than an accumulating transcript.

The convergence evidence

The research is converging on the same split.

XSI-AIMS did not invent the idea that memory types differ — it specified the contract while the field was still treating memory as an implementation detail. In the weeks around the v2.0 drafting cycle, two independent teams instantiated the same architecture — the Working/Episodic split the spec now carries as two core subsystems. H-Mem (CUHK-Shenzhen and Huawei Cloud, arXiv:2605.15701, submitted May 15, 2026) splits working fragments from consolidated episodic memory in a four-level temporal tree, and beats MemoryOS, Mem0, and Zep on the LoCoMo benchmark with 92.01% accuracy. MUSE (ByteDance, arXiv:2605.27366, submitted May 26, 2026) makes the same split (a short-term context DAG for working state, per-skill memory files for episodic lessons) and lifts task success from 53.19% to 87.94% on their 35-task evaluation subset along the way. Neither team cites XSI-AIMS. Neither, as far as we can tell, was aware of it. That is convergence — independent groups and the spec arriving at the same structure in the same weeks, blind to one another — and convergence is stronger evidence than agreement.

One claim we bound deliberately: across the 11 papers and 4 production systems in our comparative survey, none implements prospective memory — future-directed intentions an agent holds and acts on later — as a first-class tier. The concept is established in the research literature, and the failure it addresses is measured directly: under concurrent task load, model compliance with standing instructions drops 2–21%, up to 50% for terminal-action constraints (arXiv:2603.23530). XSI-AIMS specifies Prospective Memory as a core subsystem (§3.112, RFC-0021, Accepted June 3, 2026) — the harness-level store those numbers argue for.

Where Atlas fits

One interface. Routed by declaration.

XSI-AIMS Atlas is the commercial memory routing engine XSI builds on this contract. Agents call one interface and declare the subsystem — Atlas routes each call to the right store and the right backend. Backends are swappable per subsystem, declared at registration — the bus is not, and Atlas honors both halves of that contract. The spec defines what each subsystem must guarantee, the implementer chooses how, and the declaration is recorded to the public XSI-AIMS agent registry for conformance audit. Memory contents are never exposed — the declaration of capability is. That split is deliberate. XSI-AIMS is an open standard that tells you what the memory contract is, and Atlas is how XSI runs it at production scale.

The honest part

What eight subsystems do not fix.

Gap: Eight subsystems do not tune themselves, because consolidation and decay policies are workload-specific engineering — a support agent and a coding agent want different forgetting curves, and the spec does not pick them for you. Atlas routes reads and writes to the right store, but it cannot repair a wrong write — bad memory in, bad memory out. The convergence evidence is two independent teams, not a settled industry taxonomy. And XSI has published no Atlas benchmarks — every number in this post is a cited external result, bounded to exactly what those papers measured.

The answer

So — why does XSI-AIMS have eight kinds of memory? Wrong question, it turns out. The 180× latency spread, the 94.4% error persistence, the two teams independently splitting working from episodic memory: all of it points the other way. The real question is why anyone thought one store was enough.

If you are running agents at a scale where memory routing is a real decision, talk to us.

Talk to us → XSI-AIMS Atlas Read XSI-AIMS