Zhipu AI showcasing Muon Optimizer and Muon Split enhancing GLM-4.5 and GLM-5 pretraining for advanced AI model efficiency an

Editorial illustration for Zhipu AI employs Muon Optimizer and Muon Split in GLM-4.5 and GLM-5 pretraining

Zhipu AI employs Muon Optimizer and Muon Split in...

By AI Daily Post Edited by Brian Petersen, Editor-in-Chief

June 5, 2026 • Updated: July 15, 2026 • 3 min read

The wall isn't made of silicon. At Zhipu AI, engineers hit it while training GLM-4.5. The problem was the optimizer—the software that tweaks a model's billions of internal knobs.

Adam, the industry standard, had stalled. So they built an exit called Muon and baked it into their next model, the colossal 744-billion-parameter GLM-5. They paired it with a surgical fix dubbed Muon Split.

That technique isolates a specific performance lag when mixing certain attention mechanisms. It doesn't broadly boost efficiency. It orthogonally repairs the up-projection matrix for each attention head, one by one.

Muon Optimizer has gained great momentum with significant adoption from frontier AI Labs.

Using Muon Optimizer with DeepSpeed - PyTorch Blog

DeepSeek-V4, a 1.6-trillion-parameter beast, runs on Muon too. The goal there is faster convergence and steadier training. A trend is solidifying.

But integration is gritty work. Plugging Muon into standard frameworks like DeepSpeed reveals a clash of philosophies. Legacy optimizers like SGD and Adam see gradients as simple, flattened buffers—just lists of numbers.

Muon demands a more structured view.

Progress now hinges on this re-navigation of fundamentals. The labs defining the frontier have flagged the optimizer itself as the critical bottleneck. Their next breakthroughs won't be measured in raw parameter counts alone. They'll be tallied in weeks of saved training time and in costly failed runs that never happen.

Common Questions Answered

What is the Muon Optimizer and why did Zhipu AI develop it for GLM-4.5?

The Muon Optimizer is a custom optimizer that Zhipu AI developed to overcome limitations with Adam, the industry-standard optimizer that had stalled during GLM-4.5 training. Rather than accept the performance plateau, Zhipu AI built Muon as an alternative approach to tweaking a model's billions of internal parameters more effectively during pretraining.

How does Muon Split address performance issues in GLM-5 pretraining?

Muon Split is a surgical technique that isolates and fixes a specific performance lag that occurs when mixing certain attention mechanisms during model training. By targeting this particular bottleneck, Muon Split helps optimize the training process for GLM-5, Zhipu AI's massive 744-billion-parameter model.

What fundamental difference exists between legacy optimizers like Adam and the Muon Optimizer?

Legacy optimizers like SGD and Adam treat gradients as simple, flattened buffers—just lists of numbers—while Muon demands a more structured view of gradient data. This philosophical difference creates integration challenges when plugging Muon into standard frameworks like DeepSpeed, requiring a re-navigation of fundamental optimization principles.

Which other large language models are using the Muon Optimizer besides GLM-5?

DeepSeek-V4, a 1.6-trillion-parameter model, also runs on the Muon Optimizer to achieve faster convergence and steadier training. This trend of adoption across different labs indicates that Muon is becoming an increasingly viable alternative to traditional optimizers in large-scale model pretraining.

Ship an AI product this weekend — no engineers required.

Structured, in-depth lessons on the exact no-code tools — not scattered tutorials.

The exact platforms, taught in depth
Build real, working projects
Our honest review + a reader discount

Read the review →

Zhipu AI employs Muon Optimizer and Muon Split in...

Common Questions Answered

What is the Muon Optimizer and why did Zhipu AI develop it for GLM-4.5?

How does Muon Split address performance issues in GLM-5 pretraining?

What fundamental difference exists between legacy optimizers like Adam and the Muon Optimizer?

Which other large language models are using the Muon Optimizer besides GLM-5?

Further Reading

Ship an AI product this weekend — no engineers required.

Latest News

GEAK V3 Boosts AMD GPU Kernels 2.78× with Agent-Driven Optimization

Google Bakes Part of Gemini AI Directly Into "Frozen v2" Chip

Zillow's AI strategy: Build before measuring ROI, own the chat layer

Monitoring Beats Testing for Catching AI Failures, Experts Say

Ex-Trump AI Advisor Criticizes China's AI Model Rules

Adobe’s Indigo app adds AI Playground with generative photo editing

NVIDIA's Sixth-Gen NVLink Powers Millions of AI Chips

AI Agent Breached Hugging Face as Safety Guardrails Blocked Defenders

Trump Administration Weighs Ban on Chinese AI Models

Bristol Myers Squibb Builds SuperDuperPOD on NVIDIA Vera Rubin

Related Reading

ChatGPT's 'Nerdy' tweak rewards goblin metaphors in answers, study finds

Google tests visual 'magazine-style' UI for Gemini 3 Pro users

AI Engineers Face Rising Costs, Need New Strategies for Efficiency

Anthropic says Claude writes >90% of its code; AI pause button urged

Choosing AI Models: Prioritize Real‑World Needs Over Benchmark Rankings

Common Questions Answered

What is the Muon Optimizer and why did Zhipu AI develop it for GLM-4.5?

How does Muon Split address performance issues in GLM-5 pretraining?

What fundamental difference exists between legacy optimizers like Adam and the Muon Optimizer?

Which other large language models are using the Muon Optimizer besides GLM-5?

Further Reading

Ship an AI product this weekend — no engineers required.

Latest News

GEAK V3 Boosts AMD GPU Kernels 2.78× with Agent-Driven Optimization

Google Bakes Part of Gemini AI Directly Into "Frozen v2" Chip

Zillow's AI strategy: Build before measuring ROI, own the chat layer

Monitoring Beats Testing for Catching AI Failures, Experts Say

Ex-Trump AI Advisor Criticizes China's AI Model Rules

Adobe’s Indigo app adds AI Playground with generative photo editing

NVIDIA's Sixth-Gen NVLink Powers Millions of AI Chips

AI Agent Breached Hugging Face as Safety Guardrails Blocked Defenders

Trump Administration Weighs Ban on Chinese AI Models

Bristol Myers Squibb Builds SuperDuperPOD on NVIDIA Vera Rubin