Skip to main content
AI model comparison chart showing MRAgent outperforming RAG, A-MEM, MemoryOS, LangMem, and Mem0 in handling 118K tokens per q

Editorial illustration for MRAgent beats RAG, A-MEM, MemoryOS, LangMem, Mem0 with 118K tokens/query

MRAgent beats RAG, A-MEM, MemoryOS, LangMem, Mem0 with...

MRAgent beats RAG, A-MEM, MemoryOS, LangMem, Mem0 with 118K tokens/query

2 min read

MRAgent is the latest entry in a crowded field of agentic memory frameworks. While A‑MEM relies on a graph‑based approach and MemoryOS layers memory hierarchically, LangMem and Mem0 also promise persistent context across long interactions. The researchers put these systems to the test on two industry benchmarks—LoCoMo and LongMemEval—using Gemini 2.5 Flash and Claude Sonnet 4.5 as backbone models.

Across every question type and both models, MRAgent posted higher scores than standard RAG, A‑MEM, MemoryOS, LangMem and Mem0. But the numbers that matter most to enterprise developers are the token and runtime costs. In LongMemEval, MRAgent kept prompt tokens to 118 k per sample; A‑MEM needed 632 k, and LangMem burned through 3.26 million.

The runtime dropped from 1,122 seconds with A‑MEM to 586 seconds for MRAgent. The efficiency stems from on‑demand tag evaluation, pruning of irrelevant paths and an autonomous stop‑condition that prevents redundant searches. In short, the framework trims both context size and compute time while delivering stronger benchmark performance.

The system was tested against standard RAG, A-MEM, MemoryOS, LangMem, and Mem0.

MRAgent consistently outperformed every baseline across both models and all question types by a significant margin.

However, for enterprise developers, the most critical metric is often computational cost. In the LongMemEval tests, MRAgent slashed prompt token consumption to just 118k per sample. By comparison, A-Mem consumed 632k tokens, and LangMem burned through 3.26 million tokens per query. MRAgent also effectively halved the runtime compared to A-Mem, dropping from 1,122 seconds to 586 seconds.

What makes MRAgent efficient in practice is its on-demand behavior.

Why this matters

We see MRAgent handling 118 K tokens per query while still beating RAG, A‑MEM, MemoryOS, LangMem and Mem0 across models and question types. That performance gap is notable, especially when LangMem expends 3.26 M tokens for comparable work. Yet the headline numbers don’t tell the whole story.

Enterprise teams care most about computational cost, and the article stops short of giving any concrete cost analysis or hardware requirements. Without those details, it’s unclear whether the token efficiency translates into lower latency or cheaper cloud bills. Moreover, the benchmarks mentioned are “standard,” but the exact datasets and task complexities remain unspecified, leaving room for doubt about real‑world applicability.

We appreciate the clear win in raw token usage, but we remain cautious until we see transparent cost metrics and broader testing across diverse workloads. For developers and founders, the takeaway is that a new memory framework shows promise, yet further evidence is needed before committing resources to replace existing pipelines.

Further Reading