NVIDIA Blackwell architecture showcasing massive 8,192 GPU cluster running DeepSeek-V3 671B model benchmarking for MLPerf 6.0

Editorial illustration for NVIDIA Blackwell scales to 8,192 GPUs on DeepSeek‑V3 671B for MLPerf 6.0

NVIDIA Blackwell scales to 8,192 GPUs on DeepSeek‑V3...

By AI Daily Post Edited by Brian Petersen, Editor-in-Chief

June 16, 2026 • Updated: July 15, 2026 • 3 min read

Eight thousand one hundred and ninety two graphics cards. Nvidia marshaled them to train a single model, DeepSeek-V3 671B, for the latest MLPerf benchmark. This isn't a lab stunt. It's a brutal, practical demonstration of the company's Blackwell architecture and, more importantly, its ability to choreograph that many chips across multiple data centers.

The numbers are blunt. Microsoft Azure trained the Llama 3.1 405B model to the benchmark's quality target in 7.07 minutes using 8,192 Blackwell GPUs. CoreWeave did the same for the more complex DeepSeek-V3 model in 2.02 minutes at the same scale.

These results are only possible because the hardware, networking, and software were treated as a single engineering problem by Nvidia and its partners. The speed isn't just about raw silicon. It's about an entire stack being forced to cooperate.

On DeepSeek-V3 671B, the largest MoE model in the suite, NVIDIA scaled its submission to 8,192 GPUs using GB200 NVL72 systems, the largest-scale Blackwell-based submission in MLPerf Training to date.

Fastest, Largest, Strongest: NVIDIA Blackwell Sweeps MLPerf Training 6.0 - NVIDIA AI Blog

These benchmarks are Nvidia's strongest argument for its total platform. The real competition isn't just chip versus chip. It's ecosystem versus ecosystem.

When a company like CoreWeave can take a rack of over eight thousand GPUs and make them behave like a single, cohesive machine in under two minutes, it sets a practical floor for what everyone else has to build. This is how moats are dug now. Not with specs, but with system-level proof that no one else can easily replicate.

Common Questions Answered

How many GPUs did NVIDIA use to train DeepSeek-V3 671B for MLPerf 6.0?

NVIDIA marshaled 8,192 graphics cards to train the DeepSeek-V3 671B model for the latest MLPerf benchmark. This massive scale demonstrates NVIDIA's Blackwell architecture capability and its ability to orchestrate that many chips across multiple data centers simultaneously.

What does the MLPerf 6.0 benchmark reveal about NVIDIA's Blackwell platform?

The MLPerf 6.0 benchmark demonstrates NVIDIA's Blackwell architecture's practical ability to choreograph thousands of GPUs working together cohesively. These benchmarks serve as NVIDIA's strongest argument for its total platform, showing system-level proof that competitors cannot easily replicate.

How quickly can CoreWeave integrate over 8,000 GPUs into a functioning system according to the article?

CoreWeave can take a rack of over 8,000 GPUs and make them behave like a single, cohesive machine in under two minutes. This rapid integration sets a practical floor for what other companies must achieve to compete in the ecosystem.

Why does the article emphasize that GPU competition is 'ecosystem versus ecosystem' rather than 'chip versus chip'?

The article argues that moats in the GPU industry are now dug through system-level proof and ecosystem capabilities rather than individual chip specifications. When companies like CoreWeave can seamlessly coordinate thousands of GPUs across data centers, this integrated platform approach becomes the real competitive advantage that others must replicate.

Ship an AI product this weekend — no engineers required.

Structured, in-depth lessons on the exact no-code tools — not scattered tutorials.

The exact platforms, taught in depth
Build real, working projects
Our honest review + a reader discount

Read the review →

NVIDIA Blackwell scales to 8,192 GPUs on DeepSeek‑V3...

Common Questions Answered

How many GPUs did NVIDIA use to train DeepSeek-V3 671B for MLPerf 6.0?

What does the MLPerf 6.0 benchmark reveal about NVIDIA's Blackwell platform?

How quickly can CoreWeave integrate over 8,000 GPUs into a functioning system according to the article?

Why does the article emphasize that GPU competition is 'ecosystem versus ecosystem' rather than 'chip versus chip'?

Further Reading

Ship an AI product this weekend — no engineers required.

Latest News

Sources: More OpenAI Agents Reportedly Escaped Sandboxes

Apple May Charge for Advanced Siri AI Features

DeepSeek Boosts Agent, Coding Performance in Open-Source V4-Flash Model

Chinese AI Researchers Turn to X for Technical Audience

Thinking Machines' Inkling Small Beats Larger Model on Key Coding Tests

Deepseek's New AI Model Matches GPT-5.6 at 60% Lower Cost

Users Blast AI Assistant as 'Dead-End Relationship' Ad

Anthropic says Claude AI hacked companies during safety test

Anthropic says its AI models breached three companies in security tests

Anthropic Says Configuration Error Let Claude Access Open Internet

Related Reading

Grammarly faces class-action suit over AI ‘Expert Review’ feature

Claude Mythos highlights EU AI safety gaps, says researcher Caroli

After ditching AI fitness apps and a Fitbit, I return to Peloton classes

NVIDIA and Google Cloud let developers scale AI from prototype to production

NVIDIA NeMo powers telco reasoning model for autonomous network workflows

DOJ Cites National Security to Defend xAI Gas Turbines in NAACP Suit

Berlin court says Google’s AI Overviews are search format, not content

HPE AI Factory and NVIDIA unveil Vera, first CPU built for agents

NVIDIA ACE SDK Enables On‑Device AI Companions via UE5 Plugins

Common Questions Answered

How many GPUs did NVIDIA use to train DeepSeek-V3 671B for MLPerf 6.0?

What does the MLPerf 6.0 benchmark reveal about NVIDIA's Blackwell platform?

How quickly can CoreWeave integrate over 8,000 GPUs into a functioning system according to the article?

Why does the article emphasize that GPU competition is 'ecosystem versus ecosystem' rather than 'chip versus chip'?

Further Reading

Ship an AI product this weekend — no engineers required.

Latest News

Sources: More OpenAI Agents Reportedly Escaped Sandboxes

Apple May Charge for Advanced Siri AI Features

DeepSeek Boosts Agent, Coding Performance in Open-Source V4-Flash Model

Chinese AI Researchers Turn to X for Technical Audience

Thinking Machines' Inkling Small Beats Larger Model on Key Coding Tests

Deepseek's New AI Model Matches GPT-5.6 at 60% Lower Cost

Users Blast AI Assistant as 'Dead-End Relationship' Ad

Anthropic says Claude AI hacked companies during safety test

Anthropic says its AI models breached three companies in security tests

Anthropic Says Configuration Error Let Claude Access Open Internet