Editorial illustration for z.ai's GLM-5 logs record low hallucination rate, beats Moonshot's Kimi K2.5
GLM-5: Open Source AI Slashes Hallucination Rates
z.ai's GLM-5 logs record low hallucination rate, beats Moonshot's Kimi K2.5
Why does a new open‑source model matter when the market is already crowded with proprietary giants? Because the metric that often separates hype from utility—hallucination rate—has finally dropped to a level that invites serious scrutiny. z.ai announced that its GLM‑5 model logged a record‑low hallucination rate, a claim backed by a fresh set of benchmarks released this week.
The same evaluation also introduced an RL technique the team nicknamed “slime,” which, according to the developers, helps the model stay grounded during inference. The timing is noteworthy: Moonshot unveiled its Kimi K2.5 just two weeks earlier, positioning it as a direct challenger from China’s well‑funded AI sector. Yet the fresh data suggest that an open‑source effort can not only keep pace but potentially outstrip that momentum.
As Artificial Analysis compiled the numbers, the picture that emerged was unexpected—one that reshapes how we view the balance of power between community‑driven projects and heavily capitalised firms.
High performance GLM-5's benchmarks make it the new most powerful open source model in the world, according to Artificial Analysis, surpassing Chinese rival Moonshot's new Kimi K2.5 released just two weeks ago, showing that Chinese AI companies are nearly caught up with far better resourced proprietary Western rivals. According to z.ai's own materials shared today, GLM-5 ranks near state-of-the-art on several key benchmarks: SWE-bench Verified: GLM-5 achieved a score of 77.8, outperforming Gemini 3 Pro (76.2) and approaching Claude Opus 4.6 (80.9). Vending Bench 2: In a simulation of running a business, GLM-5 ranked #1 among open-source models with a final balance of $4,432.12. Beyond performance, GLM-5 is aggressively undercutting the market.
GLM‑5 arrives with an open‑source MIT licence, a detail that may ease enterprise adoption. Its AA‑Omniscience Index score of –1 marks a 35‑point jump, the lowest hallucination rate reported by the Artificial Analysis Intelligence Index v4.0. In that same evaluation, it outperformed Moonshot’s Kimi K2.5, a model released only two weeks earlier.
Yet the benchmark reflects a single, independent test; broader usage patterns remain unknown. Unclear beyond the index. The ‘slime’ reinforcement‑learning technique is cited as a key factor, but the article offers no data on its generality.
Critics might ask whether the –1 rating translates to real‑world reliability across diverse tasks. Meanwhile, it's claimed to be the most powerful open‑source model, but the ranking comes from Artificial Analysis, not a universally accepted standard. If enterprises prioritize low hallucination, GLM‑5 presents a compelling option, but its long‑term stability and performance under varied workloads have yet to be demonstrated.
The community will likely watch how the model fares beyond the initial index results.
Further Reading
- GLM-5 - Everything you need to know - Artificial Analysis
- GLM-5: From Vibe Coding to Agentic Engineering - Z.ai
- vectara/hallucination-leaderboard - GitHub/Vectara
- Why Kimi K2.5 Beats Gemini on Hallucination Benchmarks - YouTube
Common Questions Answered
What makes GLM-5's hallucination rate significant in the AI industry?
GLM-5 achieved a record-low hallucination rate with a score of -1 on the Artificial Analysis Intelligence Index v4.0, representing a 35-point improvement over previous models. This breakthrough suggests a major advancement in AI reliability and accuracy, potentially addressing one of the most critical challenges in large language model development.
How does GLM-5's architecture differ from its previous generations?
GLM-5 scales up to 744 billion total parameters with 40 billion active parameters, a significant increase from the previous 355 billion total parameters with 32 billion active parameters. The model integrates DeepSeek Sparse Attention and uses a Mixture-of-Experts architecture, expanding its pre-training to 28.5 trillion tokens and improving its overall performance capabilities.
What is unique about GLM-5's development and licensing approach?
GLM-5 is released with an open-source MIT license, allowing for flexible enterprise deployment and avoiding vendor lock-in. The model was uniquely trained entirely on Huawei Ascend chips, emphasizing China's commitment to technological independence in AI infrastructure.