“So we’re the most efficient solution when it comes to Spec_Int rate, and you can obviously translate that into performance at a rack level as well, which AMD and Intel have started to talk more about as well,” said Wittich.
He acknowledged that the benchmark is not a real-life workload, but he said that when compared to typical web stack workloads like nginx, Redis, MySQL, and Memcached, Ampere leads against the competition. “If you put all these together and look at it at the rack level, you need a lot less space, a lot less servers, and 35% less power when you’re deploying based on AmpereOne versus AMD Genoa or AMD Bergamo,” he said.
In other news, Ampere is working with Qualcomm Technologies to scale out a joint solution featuring Ampere CPUs and Qualcomm Cloud AI100 Ultra. This solution will tackle LLM inferencing on the industry’s largest generative AI models.
Ampere also announced that Meta’s Llama 3 is now running on Ampere CPUs in Oracle Cloud. Performance data shows that running Llama 3 on the 128-core Ampere Altra CPU (the predecessor to AltraOne) with no GPU delivers the same performance as an Nvidia A10 GPU paired with an x86 CPU, while using one-third of the power, according to Ampere.
Lastly, Ampere announced the formation of a Universal Chiplet Interconnect express (UCIe) working group as part of the AI Platform Alliance it formed last year. This coalition of chip designers intends to pool their resources and talent to advance AI chip development. UCIe is designed around open silicon integration, offering an open standard across the industry to build SOC-level solutions where chiplets from various companies are integrated into an SOC.
“We believe that there’s a need for open solutions across the industry that are broadly available to everybody that aren’t walled gardens and that aren’t proprietary. So we are building these best-in-class solutions at the server level with the fast time to market and give people access to the market,” said Wittich.