Prepare to rewrite your AI Infrastructure Roadmap to account for Blackwell

Prepare to rewrite your AI Infrastructure Roadmap to account for Blackwell

Nvidia launched its "Blackwell" GPUs at the 2024 GPU Technical Conference in San Jose, marking the seventh data center-class GPU.

Since 2012, Nvidia's efforts have yielded compute engines that deliver 4,367 times the raw floating-point performance of the original K10 with two GK104 GPUs, achieved twelve years ago. Notably, a significant portion of this performance enhancement results from reducing precision from FP32 single-precision to FP4 eighth precision, multiplying the chip's performance by 546 times at constant precision.

Advancements in NVLink networking allow hyperscalers, cloud builders, HPC centers, and others to tightly integrate hundreds of GPUs' memory and compute capabilities. Moreover, improvements in InfiniBand and Ethernet networking enable the connection of tens of thousands of GPUs to construct immensely powerful AI supercomputers that significantly accelerate HPC and data analytics workloads.

The performance enhancements of the "Blackwell" B100 and B200 GPU accelerators over their "Hopper" H100 and H200 predecessors will be revealed during Jensen Huang's keynote presentation on Wednesday, 20 March.

The Design:

The Blackwell GPU complex features 208 billion transistors and utilizes a 4-nanometer process called 4NP. It consists of two reticle-sized GPU chips, each with 104 million transistors, connected via NVLink 5.0 interconnects. The inability to use TSMC's 3N process has resulted in slightly more prominent and hotter chips, which offer approximately 25 percent more floating-point capability per die despite potentially lower clock speeds, resulting in a 2.5 times performance increase overall. The shift to FP4 precision further doubles the performance.

The two Blackwell dies connect through a 10 TB/sec NVLink 5.0 chip-to-chip link, presenting a unified GPU image to software crucial for scaling across clusters. The B100 and B200 devices, with top-end Blackwell GPUs featuring 192 GB of HBM3E memory and an 8 TB/sec bandwidth, promise significant memory and bandwidth improvements over previous models.

Part 2 - The Power Part

The B100 GPU showcases a peak FP4 performance of 14 petaflops while maintaining the 700-watt thermal design of its predecessor, the H100. In contrast, the B200 model elevates performance to 18 petaflops at FP4 precision, albeit with a power consumption of 1,000 watts. Insights from the excellent site, The Next Platform, show that the GPUs for the imminent GB200 NVL72 system will employ liquid cooling and operate at 1,200 watts, suggesting enhanced performance per wattage – a logical step forward.

The Challenge:

The challenge of designing a data center will continue to increase in complexity with the rate of change of GPUs, and we have yet to see the impact of AMD or some of the other company's GPUs being developed. Given how quickly GPU and Generative AI are changing, the data center you designed for today will completely change regarding power usage and cooling needed in the next few years. We will have to rethink what we are doing if we want to be at the forefront of housing Generative AI companies. If we don't, they will do it themselves.

Embracing change in tech is like surfing - adapt or wipe out 🌊. Henry Ford once hinted, progress lies in bringing innovation to the table, not just improvements. Thoughts on navigating these waters? 🤔💡#HPC #AI #datacenter

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics