NVIDIA's DGX GH200 Supercomputer

Hana M May 30, 2023 | 10:30 AM Technology

NVIDIA's DGX GH200 Supercomputer Unleashes Unprecedented Power with 256 Grace Hopper Superchips and a Massive 1-Exaflop GPU, Empowering Giant Models for Generative AI, Recommender Systems, and High-Performance Data Processing. [1]

Figure 1. NVIDIA's DGX GH200 Supercomputer.

Figure 1 shows NVIDIA's DGX GH200 Supercomputer. NVIDIA has unveiled its groundbreaking DGX GH200 supercomputer, a new class of AI computing that harnesses the power of NVIDIA GH200 Grace Hopper Superchips and the NVLink Switch System. Designed to propel the development of cutting-edge generative AI language applications, recommender systems, and data analytics workloads, the DGX GH200 boasts an enormous shared memory space facilitated by NVLink interconnect technology. By combining 256 GH200 Superchips, the system operates as a single GPU, delivering an extraordinary 1 exaflop of performance and a staggering 144 terabytes of shared memory—a colossal leap of nearly 500 times the memory capacity of its predecessor, the NVIDIA DGX A100 introduced in 2020. [1]

“Generative AI, large language models and recommender systems are the digital engines of the modern economy,” said Jensen Huang, founder and CEO of NVIDIA. “DGX GH200 AI supercomputers integrate NVIDIA’s most advanced accelerated computing and networking technologies to expand the frontier of AI.” [1]

NVIDIA NVLink Technology Expands AI at Scale

NVIDIA's GH200 Superchips revolutionize traditional CPU-to-GPU connectivity by integrating an Arm-based NVIDIA Grace CPU and an NVIDIA H100 Tensor Core GPU within a single package, utilizing NVIDIA NVLink-C2C chip interconnects. This groundbreaking approach enhances bandwidth between the GPU and CPU by a remarkable 7x compared to the latest PCIe technology. Moreover, it significantly reduces interconnect power consumption by over 5x while offering a substantial 600GB Hopper architecture GPU building block for DGX GH200 supercomputers. [1]

The DGX GH200 sets a new benchmark by pairing Grace Hopper Superchips with the innovative NVIDIA NVLink Switch System. This advanced interconnect enables all GPUs within the DGX GH200 system to seamlessly collaborate as a unified entity. In contrast, the previous-generation system could only combine eight GPUs with NVLink without compromising performance. [1]

With an architecture that provides 48x greater NVLink bandwidth than its predecessor, the DGX GH200 empowers users with the capabilities of a colossal AI supercomputer while maintaining the simplicity of programming a single GPU. [1]

A New Research Tool for AI Pioneers

Google Cloud, Meta and Microsoft are among the first expected to gain access to the DGX GH200 to explore its capabilities for generative AI workloads. NVIDIA also intends to provide the DGX GH200 design as a blueprint to cloud service providers and other hyperscalers so they can further customize it for their infrastructure. [1]

New NVIDIA Helios Supercomputer to Advance Research and Development

NVIDIA Helios, is set to drive the research and development efforts of the company. Comprising four DGX GH200 systems, Helios will leverage the power of NVIDIA Quantum-2 InfiniBand networking to enhance data throughput for training extensive AI models. With a remarkable configuration of 1,024 Grace Hopper Superchips, Helios is scheduled to be operational by the end of this year. [1]

Fully Integrated and Purpose-Built for Giant Models

DGX GH200 supercomputers come equipped with NVIDIA software, offering a comprehensive and ready-to-use solution for handling extensive AI and data analytics workloads. NVIDIA Base Command software serves as an AI workflow management tool, providing enterprise-level cluster management, accelerated compute libraries, optimized system software, and efficient infrastructure for storage and networking. [1]

Additionally, the supercomputers feature NVIDIA AI Enterprise, the software layer of the NVIDIA AI platform. This robust software package encompasses a wide range of resources, including over 100 frameworks, pretrained models, and development tools. These assets streamline the development and deployment of production-ready AI applications across various domains, including generative AI, computer vision, speech AI, and more. [1]

Source: NVIDIA

References:

  1. https://nvidianews.nvidia.com/news/nvidia-announces-dgx-gh200-ai-supercomputer

Cite this article:

Hana M (2023), NVIDIA's DGX GH200 Supercomputer, AnaTechmaz, pp.280

Recent Post

Blog Archive