Intel kicked off the Supercomputing 2023 conference with a series of high performance computing (HPC) announcements, including a new Xeon line and Gaudi AI processor.
Intel will ship its fifth-generation Xeon Scalable Processor, codenamed Emerald Rapids, to OEM partners on December 14. Emerald Rapids features a maximum core count of 64 cores, up slightly from the 56-core fourth-gen Xeon.
In addition to more cores, Emerald Rapids will feature higher frequencies, hardware acceleration for FP16, and support 12 memory channels, including the new Intel-developed MCR memory that is considerably faster than standard DDR5 memory.
According to benchmarks that Intel provided, the top-of-the-line Emerald Rapids outperformed the top-of-the-line fourth gen CPU with a 1.4x gain in AI speech recognition and a 1.2x gain in the FFMPEG media transcode workload. All in all, Intel claims a 2x to 3x improvement in AI workloads, a 2.8x boost in memory throughput, and a 2.9x improvement in the DeepMD+LAMMPS AI inference workload.
Intel also provided some details on the upcoming Gaudi 3 processor for AI inferencing. Gaudi 3 will be the last of the standalone Guadi accelerators before the company merges Gaudi with a GPU technology into a single product known as Falcon Shores.
The 5nm Gaudi 3 will have four times the performance in BF16 workloads than Gaudi 2, twice the networking (Gaudi 2 has 24x inbuilt 100 GbE RoCE Nics), and 1.5x more HBM capacity.
For a GPU, Falcon Shores will do a lot of non-graphic processing. It will support Ethernet switching and the CXL programming model.
Aurora supercomputer update
The fastest supercomputer in the world remains Frontier, an all-AMD beast at the Department of Energy’s Oak Ridge National Laboratory in Tenn. But Intel comes in second place with Aurora, which is also at a DOE facility, and Aurora isn’t completed.
When it reaches full capacity, the Aurora supercomputer at the Argonne Leadership Computing Facility will utilize 21,248 Xeon Max CPUs and 60,000 Xeon Max GPUs, making it the largest-known GPU deployment in the world.
Intel hasn’t released any formal benchmarks yet, but it did reveal one test. Intel and Argonne ran a generative AI project featuring a 1 trillion-parameter GPT-3 LLM foundational AI model for science. For comparison, ChatGPT 3.5 uses 175 billion parameters.
Because of the massive amounts of memory used in the GPU Max, Aurora can run the model with only 64 nodes. Argonne National Laboratory ran four instances of the model in parallel on 256 total nodes. The LLM is a science GPT; the models are trained on scientific texts, code and science datasets at scales of more than 1 trillion parameters from diverse scientific domains, according to Intel.
Copyright © 2023 IDG Communications, Inc.