AWS continues to offer services based on Nvidia’s Hopper chips, which remain crucial for training AI systems. The decision to transition Project to Ceiba to Blackwell chips aligns with Nvidia’s announcement in March, highlighting the superior performance of the new GPUs.
Blackwell promises a performance boost
Nvidia’s new Blackwell chips, unveiled by CEO Jensen Huang in March, are expected to be twice as powerful for training large language models (LLMs) such as OpenAI’s ChatGPT, compared to their predecessors.
The Nvidia GB200 Grace Blackwell Superchip integrates two NVIDIA B200 Tensor Core GPUs with the Nvidia Grace CPU via a 900GBps ultra-low-power NVLink chip-to-chip interconnect.
To achieve the highest AI performance, GB200-powered systems can be paired with the newly announced Nvidia Quantum-X800 InfiniBand and Spectrum-X800 Ethernet platforms, which offer advanced networking capabilities at speeds up to 800Gbps.
“Selecting the flagship Blackwell chips in lieu of less powerful Grace Hoppers from Nvidia makes more sense for advancing its AI training of LLMs, LVMs, and simulation with applications across industries,” said Neil Shah, VP for research and partner at Counterpoint Research. “For AWS, especially with this rapid evolution of the size of the training models, the cloud giant has to be prudent of its investments in getting the best ROI for the advanced compute investments as well as the efficiency of those compute from an energy consumption perspective. With Project Cebia, the goalposts are actually moving and Amazon needs to be at the leading edge to catch up with Google, and Microsoft in this AI race.”
AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure will be among the first cloud service providers to offer Blackwell-powered instances, Nvidia said in its March announcement. Additionally, companies in the Nvidia Cloud Partner program, including Applied Digital, CoreWeave, Crusoe, IBM Cloud, and Lambda, will also provide these advanced instances.