Tilde Research’s Aurora optimizer outperforming Muon and NorMuon benchmark results at 340 million-scale, showcasing superior

Editorial illustration for Tilde Research's Aurora optimizer beats Muon and NorMuon at 340M scale

Tilde Research's Aurora optimizer beats Muon and NorMuon...

Tilde Research's Aurora optimizer beats Muon and NorMuon at 340M scale

By AI Daily Post Edited by Brian Petersen, Editor-in-Chief

May 12, 2026 • 2 min read

Here's the thing: Tilde Research just dropped Aurora, a new optimizer that patches a hidden flaw in Muon. While Muon earned praise for beating AdamW in wall‑clock time to convergence on the nanoGPT speedrun, it also quietly kills off a sizable slice of MLP neurons during training, leaving them permanently dead. Aurora targets that problem head‑on, and the team backs it with a 1.1 billion‑parameter pretraining run and a fresh state‑of‑the‑art score on the modded‑nanoGPT speedrun benchmark. The code is open, so anyone can test the claims.

To understand why Aurora matters, recall Muon's core step: it computes the polar factor of the gradient matrix. Given a thin SVD G = UΣVᵀ, Muon forms polar(G) = UVᵀ and updates weights as W ← W − η UVᵀ, using matmul‑only iterative algorithms that scale. Before Aurora, NorMuon introduced a row‑normalization tweak—similar to Adam’s per‑parameter scaling—that improved speedrun results, yet the underlying reason remained murky.

Aurora promises a more principled fix. It remains to see how broadly the fix will translate across model families.

U-NorMuon corrects this by normalizing tall matrix rows to have norm √(n/m) instead of 1.
In experiments at 340M scale, U-NorMuon outperforms both Muon and standard NorMuon and completely eliminates the neuron death phenomenon -- leverage scores become approximately isotropic throughout training. Crucially, U-NorMuon propagates this benefit to layers it doesn’t directly touch: keeping up/gate rows alive ensures isotropic gradient flow into the down-projection, stabilizing its column leverage without any direct intervention.

However, U-NorMuon still has a problem: it forcefully overrides the polar factor with uniform row norms, sacrificing polar factor precision, which is both theoretically undesirable and empirically costly in the Muon framework (the paper shows that Muon achieves monotonically lower loss with more precise orthogonalization).

Tilde Research Introduces Aurora: A Leverage-Aware Optimizer That Fixes a Hidden Neuron Death Problem in Muon - MarkTechPost

Why this matters

Aurora shows that a “leverage‑aware” tweak can patch a hidden flaw in Muon, the optimizer many of us rely on. By stopping the silent death of a sizable share of MLP neurons, the new method delivers a 1.1 B‑parameter pretraining run and claims a state‑of‑the‑art result on the modded‑nanoGPT speedrun benchmark. The open‑source release lets us inspect the changes directly.

U‑NorMuon, meanwhile, normalizes tall matrix rows to √(n/m) rather than 1, a simple adjustment that, in 340 M‑scale tests, beats both Muon and the standard NorMuon while fully eradicating neuron death; leverage scores stay roughly isotropic throughout training.

What remains unclear is whether these gains persist beyond the specific benchmarks reported, or how they translate to larger, production‑level models. We also lack details on computational overhead or stability across diverse architectures. For developers and researchers, Aurora and U‑NorMuon merit a closer look, but we should temper enthusiasm until broader evaluations confirm the reported improvements.

Tilde Research's Aurora optimizer beats Muon and NorMuon...

Further Reading

Latest News

Anthropic's Mythos struggles deepen as cybersecurity ties with Trump wane

OpenAI postpones GPT‑5.6 rollout after Trump administration request

Calibration uses NVIDIA Triton Llama-3-8B A10 and vLLM Qwen2.5-7B RTX 4090 data

Meta says AI moderators make 13% fewer errors than humans, defends rollout speed

NVIDIA TensorRT Enables Context Parallelism for Multi‑GPU AI Inference

DeepReinforce releases Ornith-1.0 open-source model with state‑of‑the‑art results

Grok AI's traffic over 50% adult content as xAI expands porn generation

TokenSpeed-Kernel Delivers Top Performance on AMD GPT-OSS 120B via Gluon Kernels

OpenAI and Deepseek chatbots remain left‑leaning despite anti‑woke push

Survey frames Industrial Continual Learning for LLMs as closed-loop update cycle

Further Reading

Related Reading

Hermes Agent tops use as Nous Research’s self‑improving model leads OpenRouter

DeepMind spinoff’s AI‑designed drugs enter human trials after AlphaFold 3

Google AI Advisors Let Users Probe Performance with Conversational “Why” Queries

OpenAI unveils Daybreak to secure Codex, with industry and government rollout

New embeddings prioritize preferential similarity over semantics for clustering