AWS unveils Trainium3 UltraServers, previews Trainium4 with 6× FP4 boost
AWS rolled out its latest Trainium3 UltraServers this week, positioning the hardware as the next step for customers who need on‑premise AI acceleration. The announcement came alongside a preview of the follow‑on chip, Trainium4, which the company says will push the envelope of floating‑point throughput. While the Trainium3 line already promises higher memory bandwidth and tighter integration with existing AWS services, the tease of a successor hints at a broader strategy to keep the stack competitive against external GPU offerings.
For enterprises that blend custom silicon with NVIDIA hardware, the prospect of a chip that can link directly to NVIDIA NVLink Fusion may simplify mixed‑workload deployments. And if the performance claims hold, a six‑fold jump in FP4 capability could reshape cost calculations for large‑scale model training. The details are still early, but the roadmap suggests AWS is betting on tighter CPU‑GPU collaboration and a push toward higher‑precision FP8 workloads.
AWS also revealed early details of Trainium4, expected to deliver at least 6x the processing performance in FP4, along with higher FP8 performance and memory bandwidth. The next-generation chip will support NVIDIA NVLink Fusion interconnects to operate alongside NVIDIA GPUs and AWS Graviton processo.
AWS also revealed early details of Trainium4, expected to deliver at least 6x the processing performance in FP4, along with higher FP8 performance and memory bandwidth. The next-generation chip will support NVIDIA NVLink Fusion interconnects to operate alongside NVIDIA GPUs and AWS Graviton processors in MGX racks. AWS has already deployed more than 1 million Trainium chips to date.
The company says the latest performance improvements translate to faster training and lower inference latency. In internal tests using OpenAI's GPT-OSS open-weight model, Trn3 UltraServers delivered three times higher throughput per chip and four times faster response times compared to Trn2 UltraServers.
Trainium3 UltraServers are now generally available, AWS announced at re:Invent 2025. Built on a 3nm process, the new servers claim up to 4.4× more compute performance than Trainium2, four times the energy efficiency, and nearly four times the memory bandwidth. Each UltraServer can host up to 144 Trainium3 chips, delivering as much as 362 FP8 operations per cycle.
Numbers are impressive. The headline numbers sound strong, yet real‑world workloads will determine actual cost savings. AWS also previewed Trainium4, saying it will deliver at least six times the FP4 processing performance and higher FP8 throughput and memory bandwidth.
The next‑gen chip will support NVIDIA NVLink Fusion interconnects, allowing it to run alongside NVIDIA GPUs and AWS Graviton processors. Whether the NVLink Fusion integration will be seamless remains uncertain, and developers will need to assess compatibility with existing stacks. Will developers adopt the new interconnect?
In short, AWS is pushing a faster, more efficient training stack, but the practical impact on model training pipelines is still to be measured.
Further Reading
- AWS unveils next-gen Trainium3 custom AI chips and cloud Trainium2 instances - SiliconAngle
- AWS' Trainium2 chips for building LLMs are now generally available, with Trainium3 coming in late 2025 - TechCrunch
- Amazon.com Inc (AMZN) Unveils Advanced AI Capabilities with New AWS Trainium2 and Trainium3 Chips - GuruFocus
- Amazon promises 4x faster AI silicon in 2025, turns Trainium2 loose on the net - The Register
Common Questions Answered
What performance improvements does the Trainium3 UltraServer claim over Trainium2?
The Trainium3 UltraServer, built on a 3nm process, claims up to 4.4× more compute performance, four times the energy efficiency, and nearly four times the memory bandwidth compared to Trainium2. These gains are achieved while each server can host up to 144 Trainium3 chips, delivering as much as 362 FP8 operations per cycle.
How does AWS describe the expected floating‑point performance of the upcoming Trainium4 chip?
AWS previews Trainium4 as delivering at least a 6× boost in FP4 processing performance, along with higher FP8 performance and increased memory bandwidth. The chip will also support NVIDIA NVLink Fusion interconnects, enabling it to work alongside NVIDIA GPUs and AWS Graviton processors in MGX racks.
When were the Trainium3 UltraServers made generally available and what event coincided with the announcement?
The Trainium3 UltraServers became generally available at AWS re:Invent 2025, where the company officially rolled out the new hardware. The announcement highlighted the servers' 3nm manufacturing process and their significant performance and efficiency gains.
What does AWS say about the scale of its existing Trainium chip deployments?
AWS states that it has already deployed more than 1 million Trainium chips across its infrastructure. This large deployment base underscores the company's commitment to on‑premise AI acceleration and sets the stage for the newer Trainium3 and Trainium4 offerings.