MGBench
MGBench is the public benchmark surface for memory governance in agentic long-term memory.
It evaluates whether a memory system preserves useful long-term context while blocking stale, invalidated, cross-scope, contradictory, or failed historical memory from becoming agent-usable context.
Public Release
| Item | Value |
|---|---|
| Repository | ostinatocc/MGBench |
| Current release | v0.1.1 |
| DOI | 10.5281/zenodo.20793097 |
| Scenarios | 608 frozen deterministic scenarios |
| Suites | 8 governance suites |
| LLM judge dependency | 0 |
What It Tests
MGBench is built around admission quality, not semantic recall alone.
| Suite | What it probes |
|---|---|
| Credibility governance | Whether source trust and evidence affect admission. |
| Controlled forgetting | Whether suppression, archival, and restoration are respected. |
| Scope isolation | Whether memory stays inside the correct project, tenant, or workspace. |
| Execution-tree effect | Whether execution-state memory survives compression and handoff. |
| Ordinary-memory governance | Whether preferences, facts, and general memory avoid unsafe promotion. |
| High-trust conflict governance | Whether newer evidence can challenge older trusted memory. |
| Lifecycle inference | Whether current, stale, failed, contested, and rehydrate states are handled. |
| Execution-tree stress | Whether branch state remains safe under noisy execution histories. |
Why It Is In The Docs
Aionis is built around governed admission:
candidate memory -> admission decision -> agent context -> feedback -> measureMGBench gives that product claim a public, frozen test surface. It is useful for Aionis itself and for other memory systems that want to report memory governance behavior without relying on an LLM judge.