Blackwell Ultra speeds up AI; Nvidia Rubin platform slated for months‑away launch
Why does this matter now? Nvidia just announced that its Blackwell Ultra chips are already delivering faster inference for the latest AI workloads, while a new software stack—codenamed Vera Rubin—is still months from general availability. While the hardware gains are tangible, the company’s roadmap hinges on how quickly developers can tap into the next generation of mixture‑of‑experts (MoE) models without rewiring their pipelines.
Here’s the thing: the Blackwell Ultra line is being positioned as a purpose‑built platform for cutting‑edge models, yet the broader ecosystem will only unlock its full potential when the Rubin platform arrives. The timing is crucial; enterprises eager to scale large language models are watching both the silicon rollout and the software enablement closely. Nvidia’s leadership in AI hardware has long been a selling point, but extending that lead into the software domain could determine whether the firm stays ahead of rivals.
In that context, Salvator’s comments about Blackwell Ultra’s market position and Rubin’s role in extending Nvidia’s lead take on added weight.
Salvator noted that the high-end Blackwell Ultra is a market-leading platform purpose-built to run state-of-the-art AI models and applications. He added that the Nvidia Rubin platform will extend the company's market leadership and enable the next generation of MoEs to power a new class of applications to take AI innovation even further. Salvator explained that the Vera Rubin is built to address the growing demand in compute created by the continuing growth in model size and reasoning token generation from leading models such as MoE.
"Blackwell and Rubin can serve the same models, but the difference is the performance, efficiency and token cost," he said. According to Nvidia's early testing results, compared to Blackwell, Rubin can train large MoE models in a quarter the number of GPUs, inference token generation with 10X more throughput per watt, and inference at 1/10th the cost per token.
Rubin arrives with impressive numbers. Yet, its real‑world impact stays uncertain until chips hit servers later this decade. Nvidia touts 50 PFLOPs of NVFP4 inference and 35 PFLOPs of NVFP4 training—claims that translate to roughly five times the inference speed and 3.5 × the training throughput of the current Blackwell Ultra.
The Blackwell Ultra, described as a market‑leading platform built for state‑of‑the‑art models, will continue to power workloads while Rubin remains in development. Because the Rubin platform won’t be available until the second half of 2026, customers must decide whether to wait for the promised performance boost or stick with existing hardware. Will the promised gains translate into real‑world advantage, or will software and cost factors blunt the impact?
If the gains materialize, Nvidia could extend its market leadership and enable the next generation of mixture‑of‑experts models to power new classes of applications. However, the article offers no data on power efficiency, software ecosystem readiness, or pricing, leaving open questions about adoption hurdles. In short, the hardware promises are clear, but the path to widespread use remains to be demonstrated.
Further Reading
- Papers with Code - Latest NLP Research - Papers with Code
- Hugging Face Daily Papers - Hugging Face
- ArXiv CS.CL (Computation and Language) - ArXiv
Common Questions Answered
How much faster is the Nvidia Rubin platform's inference compared to the current Blackwell Ultra?
Nvidia claims the Rubin platform delivers roughly five times the inference speed of the existing Blackwell Ultra, achieving about 50 PFLOPs of NVFP4 inference performance. This represents a substantial jump in throughput for state‑of‑the‑art AI models.
What role does the Vera Rubin software stack play in supporting next‑generation mixture‑of‑experts (MoE) models?
Vera Rubin is designed to handle the growing compute demand caused by larger model sizes and to enable developers to run next‑generation MoE models without redesigning their pipelines. It aims to extend Nvidia's market leadership by simplifying integration of these complex architectures.
When is the Vera Rubin platform expected to become generally available, and what is its current development status?
The Vera Rubin platform is still months away from general availability, remaining in development while Nvidia continues to refine its capabilities. Although it boasts impressive performance numbers, its real‑world impact will not be clear until the associated chips are deployed in servers later this decade.
What training performance improvements does the Nvidia Rubin platform claim over the Blackwell Ultra?
Nvidia states that Rubin provides about 35 PFLOPs of NVFP4 training throughput, which is roughly 3.5 times higher than the training performance of the current Blackwell Ultra. This increase is intended to accelerate the training of larger, more complex AI models.