GitHub - stephen487/enki-benchmarks · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
stephen487
enki-benchmarks
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
master
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>1 Commit<br>1 Commit
results
results
LICENSE
LICENSE
README.md
README.md
View all files
Repository files navigation
Enki — Long-Term Memory for AI Agents
Enki is a memory engine for LLM agents. This repository publishes evaluation results only — the engine is closed-source. No configuration, internals, or methodology beyond what is described below is included here.
LongMemEval — Enki vs mem0 (head-to-head)
Both systems ingest identical conversation histories from LongMemEval-S. Each system's<br>retrieved memories are answered by the same model (Claude Haiku) and graded by the<br>same LLM-as-judge, at equal retrieval depth (K=10). The only variable is the memory layer.
Validated slice: 25 instances (full-benchmark run in progress).
Question type<br>Enki<br>mem0
Multi-session reasoning<br>4 / 5<br>2 / 5
Knowledge update<br>3 / 5<br>3 / 5
Single-session (user)<br>3 / 5<br>3 / 5
Single-session (assistant)<br>2 / 5<br>2 / 5
Single-session (preference)<br>2 / 5<br>2 / 5
Total<br>14 / 25<br>12 / 25
Storage: Enki answers from 0.49× the stored facts mem0 keeps on the same<br>conversations (mean 138 vs 283).
Standout: multi-session reasoning (4/5 vs 2/5).
Honest framing. This is a small, hand-validated slice; the overall margin (14 vs 12)<br>is modest and within what a 25-item sample can show. The robust, repeatable result is<br>comparable answer accuracy at roughly half the memory footprint , with a clear<br>multi-session advantage. Further evaluation is ongoing.
Retrieval latency (CPU-only)
Measured on a ~139-fact store, CPU-only (no GPU), 240 samples:
Percentile<br>Latency (ms)
mean<br>7.6
p50<br>6.1
p95<br>11.9
p99<br>13.0
Reproducibility
Full methodology and per-question results are available on request.
Enki Labs (UK) · 2026
About
No description, website, or topics provided.
Resources
Readme
License
MIT license
Uh oh!
There was an error while loading. Please reload this page.
Activity
Stars
stars
Watchers
watching
Forks
forks
Report repository
Releases
No releases published
Packages
Uh oh!
There was an error while loading. Please reload this page.
Contributors
Uh oh!
There was an error while loading. Please reload this page.
You can’t perform that action at this time.