AMD MI355X CDNA4 GPU benchmarking AI training performance in MLPerf v6.0, showcasing competitive results with high-speed data

Editorial illustration for AMD's MI355X CDNA4 GPU Shows Competitive Training Times in MLPerf v6.0

AMD's MI355X CDNA4 GPU Shows Competitive Training Times...

By AI Daily Post Edited by Brian Petersen, Editor-in-Chief

June 16, 2026 • Updated: July 15, 2026 • 3 min read

AMD just matched Nvidia’s top chip. In a head-to-head sprint, the company's new MI355X accelerator fine-tuned a Llama 2 70B model in the same time as a Nvidia B200 GPU, using an eight-accelerator setup. It repeated that on a Llama 3.1 8B pre-training run.

The June 2024 MLPerf tests also saw AMD scale its MI325X chip across eight nodes to generate images from text. This round delivered a crucial debut: the first appearance of AMD’s Primus training framework and its MXFP4 data format running natively on the company's own hardware.

AMD’s MLPerf Training v6.0 submission demonstrates continued progress across both hardware and software. On the hardware side, the CDNA4-generation MI355X delivers competitive time-to-train results against NVIDIA B200 on both single-node LLM benchmarks — Llama 2 70B LoRA fine-tuning and Llama 3.1 8B pretraining — at an iso-GPU count of 8, while the MI325X powers an 8-node Flux.1 Schnell text-to-image submission.

Technical Dive into AMD’s MLPerf Training v6.0 Submission - AMD ROCm AI Blog

The significance is in the execution. AMD ran its MXFP4 recipe directly on the MI355X's FP4 silicon, bypassing the drag of software emulation. Its Primus framework managed both benchmark types.

That signals a more cohesive software stack. Nvidia, of course, still won more overall benchmarks. But on this specific eight-GPU battleground, AMD’s result is clear.

The training performance gap for core AI workloads has tightened. Now, watch where the fight moves next.

Common Questions Answered

How did AMD's MI355X GPU perform against Nvidia's B200 in the MLPerf v6.0 benchmarks?

AMD's MI355X matched Nvidia's B200 GPU in head-to-head performance, successfully fine-tuning a Llama 2 70B model in the same time using an eight-accelerator setup. The MI355X repeated this competitive result on a Llama 3.1 8B pre-training run, demonstrating that the training performance gap for core AI workloads has significantly tightened between the two companies.

What is MXFP4 and how does AMD's implementation provide an advantage?

MXFP4 is AMD's new data format that debuted in the MLPerf v6.0 tests. AMD ran MXFP4 directly on the MI355X's FP4 silicon, bypassing the performance drag of software emulation, which gives it a significant efficiency advantage over traditional approaches.

What role does AMD's Primus training framework play in these benchmark results?

AMD's Primus framework is a new training framework that managed both benchmark types in the MLPerf v6.0 tests, signaling a more cohesive software stack. This unified framework demonstrates AMD's ability to handle diverse AI workloads efficiently across its accelerators.

What was the significance of AMD's MI325X performance in the MLPerf v6.0 image generation tests?

AMD scaled its MI325X chip across eight nodes to generate images from text in the June 2024 MLPerf tests, showcasing its capability for multi-node deployment. This demonstrated AMD's competitive positioning not just in language model training but also in generative image tasks.

Ship an AI product this weekend — no engineers required.

Structured, in-depth lessons on the exact no-code tools — not scattered tutorials.

The exact platforms, taught in depth
Build real, working projects
Our honest review + a reader discount

Read the review →

AMD's MI355X CDNA4 GPU Shows Competitive Training Times...

Common Questions Answered

How did AMD's MI355X GPU perform against Nvidia's B200 in the MLPerf v6.0 benchmarks?

What is MXFP4 and how does AMD's implementation provide an advantage?

What role does AMD's Primus training framework play in these benchmark results?

What was the significance of AMD's MI325X performance in the MLPerf v6.0 image generation tests?

Further Reading

Ship an AI product this weekend — no engineers required.

Latest News

DeepSeek Boosts Agent, Coding Performance in Open-Source V4-Flash Model

Chinese AI Researchers Turn to X for Technical Audience

Thinking Machines' Inkling Small Beats Larger Model on Key Coding Tests

Deepseek's New AI Model Matches GPT-5.6 at 60% Lower Cost

Users Blast AI Assistant as 'Dead-End Relationship' Ad

Anthropic says Claude AI hacked companies during safety test

Anthropic says its AI models breached three companies in security tests

Anthropic Says Configuration Error Let Claude Access Open Internet

Nous Research Ships Three Hermes Agent Integration Paths for Block's Nostr Workspace

PolyAI's Dialog-RSN-1 Fuses Speech Recognition and Response

Related Reading

Google's FACTS benchmark shows 70% factuality ceiling across four tests

Databricks finds multi-step agents beat single-turn RAG by 21% to 38% on STaRK

Nvidia's DLSS 4.5 beta adds 6x Multi Frame Generation for RTX 50 GPUs

NVIDIA Blackwell Leads MLPerf Training 6.0 with Full‑Stack Scale

DR-DCI Enables Agent-Callable Retrieval to Expand Local Workspace Efficiently

Common Questions Answered

How did AMD's MI355X GPU perform against Nvidia's B200 in the MLPerf v6.0 benchmarks?

What is MXFP4 and how does AMD's implementation provide an advantage?

What role does AMD's Primus training framework play in these benchmark results?

What was the significance of AMD's MI325X performance in the MLPerf v6.0 image generation tests?

Further Reading

Ship an AI product this weekend — no engineers required.

Latest News

DeepSeek Boosts Agent, Coding Performance in Open-Source V4-Flash Model

Chinese AI Researchers Turn to X for Technical Audience

Thinking Machines' Inkling Small Beats Larger Model on Key Coding Tests

Deepseek's New AI Model Matches GPT-5.6 at 60% Lower Cost

Users Blast AI Assistant as 'Dead-End Relationship' Ad

Anthropic says Claude AI hacked companies during safety test

Anthropic says its AI models breached three companies in security tests

Anthropic Says Configuration Error Let Claude Access Open Internet

Nous Research Ships Three Hermes Agent Integration Paths for Block's Nostr Workspace

PolyAI's Dialog-RSN-1 Fuses Speech Recognition and Response