NVIDIA Blackwell Ultra GPU system, 288 GPUs, achieving MLPerf Inference throughput record. AI, deep learning.

Editorial illustration for NVIDIA's 288-GPU Blackwell Ultra Sets New MLPerf Inference Throughput Record

NVIDIA Blackwell Ultra Shatters MLPerf Inference Records

NVIDIA's 288-GPU Blackwell Ultra Sets New MLPerf Inference Throughput Record

April 2, 2026 • 2 min read

NVIDIA’s latest hardware push is more than a headline; it’s a concrete test of how far inference performance can be stretched when architecture, software and system integration are engineered as a single unit. The company paired its Blackwell Ultra silicon with a massive cluster that dwarfs typical research configurations, then ran the MLPerf Inference suite—a widely respected benchmark that measures real‑world AI serving speed. By scaling the deployment to a size never before attempted in the contest, NVIDIA forced the benchmark to confront bottlenecks in data movement, memory bandwidth and orchestration that smaller setups simply avoid.

The result is a set of numbers that push token‑processing rates into the multi‑million‑per‑second range, hinting at what could be feasible for large‑scale language‑model services. As the community looks toward the next phase of the benchmark—MLPerf Endpoints—this extreme co‑design effort offers a glimpse of the engineering trade‑offs that will shape future deployments.

With 288 Blackwell Ultra GPUs--the largest scale ever submitted to any benchmark in MLPerf Inference--submissions set new system-level throughput records, enabling millions of tokens processed per second. Looking ahead to MLPerf Endpoints Delivered inference throughput takes extreme co-design across many chips, system architecture, data center design, and software. The latest MLPerf Inference v6.0 results show that NVIDIA yields unmatched inference throughput across the broadest range of workloads, from massive LLMs to advanced vision language models, to generative recommender systems and more, on industry-standard benchmarks. AI inference workloads also continue to evolve rapidly, as model sizes grow and context lengths rise.

NVIDIA Extreme Co-Design Delivers New MLPerf Inference Records - NVIDIA Developer Blog

Did the new record prove anything beyond raw chip speed? NVIDIA’s 288‑GPU Blackwell Ultra system pushed MLPerf Inference v6.0 to a throughput measured in millions of tokens per second, the highest ever reported for a single submission. The achievement rests on a tightly coupled stack of hardware, software and model optimizations, a point the article emphasizes repeatedly.

Yet the benchmark focuses on system‑level performance rather than isolated silicon metrics, suggesting that raw transistor counts may matter less than integration. The record also highlights the importance of token‑based revenue models for AI factories, though how this translates to commercial profitability remains unclear. Looking ahead, the mention of upcoming MLPerf Endpoints hints at further testing, but the article provides no details on what those results might entail.

Consequently, while the numbers are impressive, the broader impact on real‑world deployments is it's still uncertain. The data underscores that extreme co‑design can deliver record throughput, but whether this approach scales cost‑effectively across diverse workloads is an open question.

Common Questions Answered

How many GPUs were used in NVIDIA's Blackwell Ultra MLPerf Inference submission?

NVIDIA deployed 288 Blackwell Ultra GPUs in its MLPerf Inference v6.0 submission, which represents the largest scale ever attempted in this benchmark. This massive GPU cluster enabled unprecedented system-level throughput, processing millions of tokens per second.

What makes the Blackwell Ultra MLPerf Inference result significant beyond raw speed?

The result demonstrates NVIDIA's ability to achieve extreme co-design across multiple system components, including chips, system architecture, data center design, and software. The achievement highlights that performance is not just about individual GPU capabilities, but the integrated optimization of the entire computing stack.

What key performance metric did NVIDIA achieve in the MLPerf Inference v6.0 benchmark?

NVIDIA's 288-GPU Blackwell Ultra system set a new record for inference throughput, processing millions of tokens per second. This benchmark result represents the highest throughput ever reported for a single submission in MLPerf Inference.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

NVIDIA Blackwell Ultra Shatters MLPerf Inference Records

Further Reading

Common Questions Answered

How many GPUs were used in NVIDIA's Blackwell Ultra MLPerf Inference submission?

What makes the Blackwell Ultra MLPerf Inference result significant beyond raw speed?

What key performance metric did NVIDIA achieve in the MLPerf Inference v6.0 benchmark?

Most Popular

Meta's structured prompting lifts LLM code review accuracy to 93%

Anthropic's Claude Code includes Kairos daemon that runs after window closes

Elgato adds MCP support in Stream Deck 7.4 update, enabling new trigger method

RFK Jr. urges Americans to use banned peptide drugs popular with influencers

EU bans AI‑generated content in official communications, cites authenticity

Kilo launches KiloClaw to secure enterprise AI agents at scale

Claude leak of 512,000+ lines reveals Tamagotchi-style pet, always‑on agent

DeepMind study finds six traps that let a few poisoned docs hijack AI agents

Google Antigravity Skills and Workflows Aim to Streamline AI Agent Development

LWiAI Podcast #238: GPT 5.4 Mini, OpenAI Pivot, Mamba 3, Attention Residuals

Further Reading

Related Reading

Hyperparameter Tuning Reaches 0.9617 Accuracy in 64.59 Seconds

Pharma Cautious as AI Promises Faster Drug Discovery and Smarter Trials

Google AI Advisors Let Users Probe Performance with Conversational “Why” Queries

Nvidia's NVentures: 21 Deals in 2023 Fuel AI Ecosystem Expansion

HCLTech and NVIDIA Open AI Innovation Lab in Santa Clara Using Full NVIDIA Stack

DeepMind study finds six traps that let a few poisoned docs hijack AI agents

AI productivity gap: top agent beats baseline in 1 of 15 runs, 26.5% subtasks

Nvidia's DLSS 4.5 beta adds 6x Multi Frame Generation for RTX 50 GPUs

Nvidia-backed ThinkLabs raises USD 28M, shows AI power‑grid results with SCE

Common Questions Answered

How many GPUs were used in NVIDIA's Blackwell Ultra MLPerf Inference submission?

What makes the Blackwell Ultra MLPerf Inference result significant beyond raw speed?

What key performance metric did NVIDIA achieve in the MLPerf Inference v6.0 benchmark?

Most Popular

Meta's structured prompting lifts LLM code review accuracy to 93%

Anthropic's Claude Code includes Kairos daemon that runs after window closes

Elgato adds MCP support in Stream Deck 7.4 update, enabling new trigger method

RFK Jr. urges Americans to use banned peptide drugs popular with influencers

EU bans AI‑generated content in official communications, cites authenticity

Kilo launches KiloClaw to secure enterprise AI agents at scale

Claude leak of 512,000+ lines reveals Tamagotchi-style pet, always‑on agent

DeepMind study finds six traps that let a few poisoned docs hijack AI agents

Google Antigravity Skills and Workflows Aim to Streamline AI Agent Development

LWiAI Podcast #238: GPT 5.4 Mini, OpenAI Pivot, Mamba 3, Attention Residuals