paul morrison paul morrison

Calm Before the Storm: How the GPU Shortage is Allowing Data Centers to Evolve for AI

Calm Before the Storm: How the GPU Shortage is Allowing Data Centers to Evolve for AI

The global shortage of graphics processing units (GPUs) has severely constrained AI implementations across industries. But this lull has provided an unexpected chance for data center operators to reinvent their physical infrastructure to support surging AI workloads.


Prompt: author, Claude Image: Midjourney

HPC + AI Wall Street conference HPCwire panel : “HPC + AI Resources in the Great GPU Squeeze”

At the HPC + AI Wall Street conference held in New York on September 26, 2023, HPCwire hosted an industry panel titled “HPC + AI Resources in the Great GPU Squeeze” to examine strategies for high performance computing in an era of constrained GPU supply.

The panel was moderated by Doug Eadline, Managing Editor of HPCwire. Experts included:

  • Thomas Jorgensen, Senior Director of Technology Enablement at Supermicro

  • Wayne Gorman, Solutions Manager for HPC and AI Infrastructure at Google Cloud

  • Kiran Agrahara, Cloud Solutions Architect at Intel

  • Prabhu Ramamoorthy, Global Partner Success Manager at NVIDIA

Thomas Jorgensen stated Supermicro has “thousands of systems backordered due to lack of H100 GPUs.” This illustrates the severity of the shortage, with major suppliers like Supermicro unable to fulfill orders. Data center operators across industries likely face significant delays securing the GPUs needed for AI and HPC workloads.

Faced with lengthening GPU delivery timelines, pioneering data center operators like Meta have paused construction projects mid-development to fundamentally redesign their facilities for AI. Modern GPUs required to train ever-larger AI models have intensely high power densities. While current enterprise data center racks are designed for approximately 10 kW per rack, AI-focused GPUs require densities from 50 kW per rack today up to 100-200 kW per rack in the near future.

This looming density increase of 5-10x presents a major challenge for data center designers. Cooling, power distribution, and network infrastructure tailored for lower density workloads cannot support the sheer power draw and low latency interconnect demands of heavily GPU-powered AI training.

Meta’s redesigns account for liquid cooling, direct high-density power delivery to racks, and advanced networking required to fully leverage the capabilities of these dense, power-hungry GPUs. Other pioneering operators are exploring innovations like immersion cooling for GPU racks and HVDC power distribution for minimal losses.

Meanwhile, suppliers are rapidly developing higher efficiency processors and accelerators to address AI’s insatiable compute requirements. Kiran Agrahara of Intel noted diverse options like CPUs, FPGAs, and new AI-focused accelerators provide flexibility despite ongoing GPU supply constraints. Prabhu Ramamoorthy of NVIDIA emphasized the need to explore creative solutions and alternative technologies rather than wait for GPU supply to improve. This lull has granted the industry time to rethink data center architectures to unlock AI's immense potential.

The Rise of High Density Data Centres

This unexpected lull before the full brunt of the AI compute explosion has granted the data center industry invaluable time to re-architect infrastructure for the 200+ kW rack power densities needed in the imminent future. AI is spurring a complete reimagining of data center design principles and assumptions. Facilities built around AI will look radically different than preceding generations oriented toward general purpose CPUs at much lower densities.

As Wayne Gorman of Google Cloud summarized: "I see AI as a domain within HPC." Tomorrow's data centers will holistically integrate optimized AI capabilities, rather than treat AI workloads as an add-on. Innovative high-density designs created during this temporary shortage-driven pause will be ready to meet soaring AI demands when surging mainstream GPU adoption resumes.


The H100 Alternative Contenders

Alternative Details
AMD GPUs AMD introduced the MI300A APU and MI300X GPU for AI and HPC workloads, optimized for large language models; MI300A features 128GB HBM3 memory and 24 Zen 4 CPU cores; MI300X offers up to 192GB HBM3, 153 billion transistors, and 5.2TB memory bandwidth, claimed to be the fastest GPU for generative AI
Intel AMX Built-in matrix multiply accelerators in Intel Xeon Scalable processors designed to improve AI training and inference performance directly on CPUs; shown to enhance AI inference on Alibaba Cloud and throughput for BERT model with Tencent - provides way to accelerate AI workloads natively on Intel CPUs vs. specialized accelerators like Habana's Gaudi2 and Greco
Intel FPGA and ASICs Intel's Infrastructure Processing Units (IPUs) offload tasks like security and virtualization from CPUs to improve efficiency; 2nd-gen 200G IPUs include FPGA-based Oak Springs Canyon and ASIC Mount Evans co-developed with Google; support common IPDK programming framework; future roadmap includes 400G and 800G IPUs
Intel Habana Gaudi2 Discrete AI training accelerator from Intel's Habana Labs, upgraded from Gaudi to 7nm process, 24 tensor cores vs 10, 96GB memory vs 32GB, and 48MB SRAM vs 32MB; shows up to 3.2x performance of Gaudi and 2.8x throughput of Nvidia A100 for AI workloads
Intel Habana Greco Discrete AI inference accelerator from Habana Labs, also moved to 7nm process; upgraded to LPDDR5 memory for 5x bandwidth vs Goya and 128MB on-chip memory vs 50MB; lower 75W TDP vs 200W for Goya allows higher density deployments
Intel Xeon CPUs 4th gen Xeon Scalable processors designed to unlock new performance levels for breadth of AI workloads; Xeon CPU Max Series with HBM delivers up to 4.8x better AI performance; most built-in accelerators like DL Boost and AVX-512; new Efficient-core architecture optimized for AI efficiency; HBM memory improves performance; designed to deliver improved inference and training performance across wide range of AI applications
Nvidia A100 GPU Flagship data center GPU based on Ampere architecture; designed for AI and HPC workloads; 7nm manufacturing process; 40GB or 80GB memory options; peak performance up to 19.5 TFLOPS FP32; 3rd gen NVLink/NVSwitch interconnects; MIG GPU partitioning and multi-GPU scaling deliver flexibility and scalability
Nvidia L40 GPU Based on Nvidia's Ada Lovelace architecture, is a newer GPU optimized for AI and graphics performance in data centers, designed to offer excellent power efficiency for enterprises integrating AI into their operations; delivers 91.6 teraFLOPS of FP32 performance
Xilinx Versal AI Core Xilinx's Versal series represents strategic shift from FPGAs to integrated platform chips with programmable logic, AI engines, scalar/adaptable engines, advanced I/O, video decoders, and NoC; provides over 100X compute of current server CPUs for AI inference and wireless acceleration
Read More