MemPalace's AI memory system benchmark claims and methodology.
Raw Developer Origin & Technical Request
GitHub Issue
Apr 7, 2026
_Disclosure: I'm working on a different AI Memory project and an author of a public LoCoMo ground-truth audit: github.com/dial481/locomo-au...
The 100% LoCoMo claim is what immediately brought my attention to this repository.
### 1. 100% on LoCoMo should not be achievable. The ground truth is broken.
Our audit documents ~99 wrong, hallucinated, misattributed, or ambiguous answers in the LoCoMo ground truth across all ten conversations (`errors_conv_0.json` through `errors_conv_9.json`). Examples include hallucinated objects ("symbols," "bowl") and speaker-attribution errors where the evidence dialog is spoken by the wrong character. The honest ceiling on LoCoMo as published is roughly **93–94%**, not 100%.
A reported 100% therefore implies one of two things: the system is wrong in the same ways the ground truth is wrong, or the metric being reported is not reliably measuring answer correctness.
With respect to the second case, the audit covers the fact that the LLM judge in LoCoMo accepts up to ~63% of intentionally wrong answers.
### 2. The 100% is a retrieval bypass. Disclosed in this repo, stripped from the launch tweet.
`benchmarks/BENCHMARKS.md`, verbatim:
> "The LoCoMo 100% result with top-k=50 has a structural issue: each of the 10 conversations has 19–32 sessions, but top-k=50 exceeds that count. This means the ground-truth session is always in the candidate pool regardless of the embedding model's ranking. The Sonnet rerank is essentially doing read...
Developer Debate & Comments
No active discussions extracted for this entry yet.
Adjacent Repository Pain Points
Other highly discussed features and pain points extracted from milla-jovovich/mempalace.
Engagement Signals
Cross-Market Term Frequency
Quantifies the cross-market adoption of foundational terms like LLM judge and LoCoMo ground-truth audit by tracking occurrence frequency across active SaaS architectures and enterprise developer debates.
SaaS Metrics