Intel addresses data centres with Sapphire Rapids processor
Another launch at this year’s Intel’s Architecture Day was the next generation Intel Xeon Scalable processor (code-named Sapphire Rapids). The processor delivers substantial compute performance across dynamic and increasingly demanding data centre uses and is workload-optimised to deliver high performance on elastic compute models like cloud, microservices and artificial intelligence (AI).
The processor is based on a tiled, modular SoC architecture that leverages Intel’s embedded multi-die interconnect bridge (EMIB) packaging technology making it scalable while maintaining the benefits of a monolithic CPU interface. Sapphire Rapids provides a single balanced unified memory access architecture, with every thread having full access to all resources on all tiles, including caches, memory and I/O. According to Intel, the processor offers consistent low latency and high cross-section bandwidth across the entire SoC.
Sapphire Rapids is built on Intel 7 process technology and features Intel’s new Performance-core microarchitecture (see softei news 23 August), which is designed for speed and pushes the limits of low latency and single-threaded application performance. Sapphire Rapids delivers the industry’s broadest range of data centre-relevant accelerators, including new instruction set architecture and integrated IP to increase performance across a broad range of customer workloads and usages.
The processor integrates acceleration engines including the Intel Accelerator Interfacing Architecture (AIA). This supports efficient dispatch, synchronisation and signalling to accelerators and devices. There is also the Intel Advanced Matrix Extensions (AMX) which is a workload acceleration engine which delivers massive acceleration to the tensor processing at the heart of deep learning algorithms. It can provide an increase in computing capabilities with 2K INT8 and 1K BFP16 operations per cycle, said Intel.
Tests using early Sapphire Rapids silicon and optimised internal matrix-multiply micro benchmarks run over seven times faster using Intel AMX instruction set extensions compared to a version of the same micro benchmark using Intel AVX-512 VNNI instructions. This is significant for performance gains across AI workloads for both training and inference.
The Intel Data Streaming Accelerator (DSA) is designed to offload the most common data movement tasks to alleviate overhead and improve processing these overhead tasks to deliver increased overall workload performance. It can move data among CPU, memory and caches, as well as all attached memory, storage and network devices
The processor is built to drive industry technology transitions with advanced memory and next generation I/O, including PCIe 5.0, CXL 1.1, DDR5 and HBM technologies. An Infrastructure Processing Unit (IPU) is a programmable networking device designed to enable cloud and communication service providers to reduce overhead and free up performance for CPUs. Intel’s IPU-based architecture separates infrastructure functions and tenant workload which allows tenants to take full control of the CPU. The cloud operator can offload infrastructure tasks to the IPU to maximise the CPU. IPUs can manage storage traffic, which reduces latency while efficiently using storage capacity via a diskless server architecture. IPU allows users to use resources with a secure, programmable and stable solution that enables them to balance processing and storage.
Mount Evans is Intel’s first ASIC IPU. It integrates learnings from multiple generations of FPGA SmartNICs. It offers high performance network and storage virtualisation offload while maintaining a high degree of control. It provides a programmable packet processing engine for firewalls and virtual routing. There is also a hardware accelerated NVMe storage interface scaled up from Intel Optane technology to emulate NVMe devices. Intel Quick Assist technology deploys advanced crypto and compression acceleration.