Editorial illustration for NVIDIA's Star Elastic bundles 30B, 23B, 12B models; 23B hits 85.63 on AIME-2025
NVIDIA's Star Elastic bundles 30B, 23B, 12B models; 23B...
NVIDIA's Star Elastic bundles 30B, 23B, 12B models; 23B hits 85.63 on AIME-2025
Training a family of large language models has always been a cost‑heavy exercise. Every variant—whether 8 B, 30 B, or 70 B—needs its own full training run, its own storage footprint and its own deployment stack. For teams that serve inference at scale, that multiplier translates directly into higher compute bills and more engineering overhead. NVIDIA researchers are now proposing a different tack with a method they call **Star Elastic**.
Star Elastic is a post‑training technique that folds several nested submodels into a single parent reasoning model, all from one training run. The approach was demonstrated on **Nemotron Nano v3**, a hybrid Mamba–Transformer–Mixture‑of‑Experts architecture that carries 30 B total parameters but only 3.6 B active ones. Within that checkpoint the team extracted a 23 B variant (2.8 B active) and a 12 B variant (2.0 B active), each trained on roughly 160 B tokens.
Because the three models share a single checkpoint, they can be pulled out without any extra fine‑tuning. The result is a compact, multi‑size family that sidesteps the traditional “one model, one run” paradigm.
The Elastic-23B notably scores 85.63 on AIME-2025 versus Qwen3-30B-A3B’s 80.00, despite having fewer active parameters.
On training cost, the research team reports a 360× token reduction compared to pretraining each variant from scratch, and a 7× reduction over prior state-of-the-art compression methods that require sequential distillation runs per model size. The 12B variant runs at 2.4× the throughput of the 30B parent on an H100 GPU at bfloat16 with the same input/output sequence lengths.
nvidia-star-elastic">How to Use NVIDIA Star Elastic
Why this matters
NVIDIA’s Star Elastic packs 30 B, 23 B and 12 B models into a single checkpoint, promising developers a way to sidestep the usual multiplier of training runs, storage and deployment pipelines. The 23 B variant already posted an 85.63 score on AIME‑2025, edging out Qwen3‑30B‑A3B’s 80.00 despite using fewer active parameters. That result suggests the slicing approach can retain performance while shrinking the active model size.
The team also claims a 360× token reduction in training cost versus pretraining each model from scratch, and a seven‑fold cut in another metric, likely compute or storage. If those reductions hold in production, inference‑heavy teams could see noticeably lower expense. Yet the report stops short of detailing real‑world latency, memory overhead when switching slices, or how the single checkpoint scales across diverse hardware.
It’s unclear whether the gains translate beyond benchmark settings. We’ll watch how developers integrate Star Elastic into existing stacks and whether the promised efficiencies survive the rigors of deployment.
Further Reading
- NVIDIA AI Releases Nemotron-Elastic-12B: A Single AI Model that Gives You 6B, 9B, 12B Variants Without Extra Training Cost - MarkTechPost
- AIME 2025 Benchmark Leaderboard - LLM Stats
- AIME 2025 Benchmark Leaderboard - Artificial Analysis - Artificial Analysis
- Technical Performance | The 2025 AI Index Report - Stanford HAI