Editorial illustration for Zhipu AI's GLM-5.1 optimizes code over hundreds of rounds, thousands of tool calls
GLM-5.1: AI That Rewrites Its Own Code Autonomously
Zhipu AI's GLM-5.1 optimizes code over hundreds of rounds, thousands of tool calls
Why does this matter? Because a language model that can rewrite its own code while it’s running pushes the boundary of what developers expect from AI assistance. While most LLMs stop at generating a single solution, Zhipu AI’s newest release, GLM‑5.1, keeps iterating, testing and re‑tooling until it lands on a more efficient answer.
The company showcases three internally run demos to prove the point. In the first, the model is tasked with tuning a vector‑database pipeline; midway through, it abandons its initial approach and adopts a completely different method. The second and third scenarios follow the same pattern, each involving dozens of back‑and‑forth calls to external utilities.
Here’s the company’s own description of the process: “Zhipu AI describes optimization across ‘hundreds of rounds and thousands of tool calls.’”
Zhipu AI describes optimization across "hundreds of rounds and thousands of tool calls." The company demonstrates this with three scenarios, though all of them were conducted internally. GLM-5.1 switches strategies on its own mid-task In the first scenario, GLM-5.1 had to optimize a vector database - a system that searches large datasets and finds similar entries. The goal: answer as many search queries per second as possible without losing accuracy. In a standard test run with 50 rounds, Claude Opus 4.6 held the previous best score of 3,547 queries per second, according to Zhipu AI.
Zhipu AI’s GLM‑5.1 arrives as a free model aimed at long‑running programming tasks. It’s free. On the SWE‑Bench Pro benchmark it narrowly outperforms GPT‑5.4 and Claude Opus 4.6, suggesting competitive capability.
The company stresses that the model avoids dead ends by repeatedly reviewing its own strategy and, when needed, fundamentally changing its approach. Optimization is described as occurring over “hundreds of rounds and thousands of tool calls,” a level of iteration rarely seen in comparable systems. Three internal scenarios illustrate the claim, including a case where GLM‑5.1 switched strategies mid‑task while optimizing a vector database.
Because all demonstrations were conducted internally, external validation remains limited. It's unclear whether the self‑revising loop will scale reliably across diverse codebases or integrate smoothly with existing developer workflows. The results are promising, yet the absence of broader testing leaves open questions about real‑world robustness and consistency.
Until independent evaluations are published, the practical impact of GLM‑5.1’s iterative coding approach stays uncertain.
Further Reading
- GLM-5.1: Zhipu AI's 8-Hour Autonomous Reasoning Model - Automatio.ai
- Z.ai unveils GLM-5.1, enabling AI coding agents to run autonomously for hours - Computerworld
- Z.ai ups ante in open-source LLMs with GLM-5.1 - Constellation Research
Common Questions Answered
How does GLM-5.1 differ from other language models in code optimization?
GLM-5.1 can continuously iterate and rewrite its own code across hundreds of rounds and thousands of tool calls, unlike most language models that generate a single solution. The model dynamically switches strategies mid-task and keeps testing and refining until it finds the most efficient answer.
What specific benchmark performance does GLM-5.1 demonstrate?
On the SWE-Bench Pro benchmark, GLM-5.1 narrowly outperforms GPT-5.4 and Claude Opus 4.6, indicating competitive AI programming capabilities. The model is particularly notable for its ability to avoid dead ends by repeatedly reviewing and fundamentally changing its approach when needed.
What was the vector database optimization scenario demonstrated by Zhipu AI?
In the first internal scenario, GLM-5.1 was tasked with optimizing a vector database pipeline to maximize search queries per second while maintaining accuracy. The model showcased its ability to dynamically adjust strategies and improve performance through multiple iterations.