Skip to main content
Technician in an Azure data center stands beside a ND GB300 server rack, gesturing to a screen showing a 1.1 M token/sec chart.

Editorial illustration for Microsoft Azure ND GB300 VM Boosts AI Performance with 50% More GPU Memory

Azure ND GB300 VM Boosts AI Compute by 50%

Microsoft's Azure ND GB300 VM hits 1.1 M tokens/sec, 50% more GPU memory

Updated: 2 min read

Microsoft is pushing the boundaries of AI infrastructure with its latest Azure virtual machine, designed to supercharge machine learning workloads. The Azure ND GB300 VM represents a significant leap in computational power, promising to accelerate AI inference tasks for businesses and researchers hungry for faster processing.

Cloud computing's AI demands are skyrocketing, and Microsoft's new offering appears tailor-made for organizations wrestling with complex machine learning models. By boosting GPU memory and increasing thermal design power, the company is signaling a serious commitment to high-performance AI computing.

But raw specs only tell part of the story. The real test comes in practical performance, where speed and efficiency can make or break AI applications. How much faster can complex models run? What kind of workloads will see the most dramatic improvements?

Microsoft's own benchmarks offer an intriguing preview of the VM's potential, with early testing suggesting impressive gains that could reshape how companies approach large-scale AI inference.

The VM is optimised for inference workloads, featuring 50% more GPU memory and a 16% higher TDP (Thermal Design Power). To simulate the performance gains, Microsoft ran the Llama2 70B (in FP4 precision) from MLPerf Inference v5.1 on each of the 18 ND GB300 v6 virtual machines on one NVIDIA GB300 NVL72 domain. This used the NVIDIA TensorRT-LLM as the inference engine.

"One NVL72 rack of Azure ND GB300 v6 achieved an aggregated 1,100,000 tokens/s," said Microsoft. "This is a new record in AI inference, beating our own previous record of 865,000 tokens/s on one NVIDIA GB200 NVL72 rack with the ND GB200 v6 VMs." Since the system contains 72 Blackwell Ultra GPUs, the performance roughly translates to ~15,200 tokens/sec/GPU.

Microsoft's latest Azure ND GB300 VM signals a significant leap in AI inference performance. The new virtual machine delivers an impressive 1.1 million tokens per second, powered by 50% more GPU memory and a 16% higher thermal design power.

The benchmark, which used Llama2 70B in FP4 precision, demonstrates the potential for accelerated machine learning workloads. Microsoft's test across 18 virtual machines on a single NVIDIA GB300 NVL72 domain showcases the scalability of this infrastructure.

Performance gains aren't just theoretical. By using NVIDIA TensorRT-LLM as the inference engine, Microsoft has established a new benchmark in AI computational speed. The results suggest cloud-based AI systems are becoming increasingly efficient.

Still, questions remain about real-world application and consistent performance across different workloads. While the numbers are promising, practical deployment will ultimately determine the VM's true value.

For now, Microsoft has set a new performance marker in AI infrastructure. Researchers and enterprises tracking computational efficiency will want to watch this development closely.

Further Reading

Common Questions Answered

How much GPU memory does the Azure ND GB300 VM offer compared to previous models?

The Azure ND GB300 VM provides 50% more GPU memory than previous virtual machine configurations. This significant memory increase is designed to support more complex machine learning and AI inference workloads, enabling faster and more efficient processing of large AI models.

What performance benchmark did Microsoft use to demonstrate the Azure ND GB300 VM's capabilities?

Microsoft used the Llama2 70B model in FP4 precision from MLPerf Inference v5.1 to benchmark the VM's performance. By running the test across 18 ND GB300 v6 virtual machines on a single NVIDIA GB300 NVL72 domain, they achieved an impressive 1.1 million tokens per second, setting a new record in AI inference performance.

What makes the Azure ND GB300 VM particularly suitable for AI and machine learning workloads?

The Azure ND GB300 VM is specifically optimized for inference workloads, featuring 50% more GPU memory and a 16% higher Thermal Design Power (TDP). These enhancements allow organizations to process complex machine learning models more efficiently, addressing the rapidly growing computational demands of AI infrastructure.