Posts

NVIDIA Reveals DGX GH200 System Architecture

Image
We industry analysts sometimes get out over our skis when trying to project details of new products. Following NVIDIA's DGX GH200 announcement at Computex, we noticed industry press making the same mistake. Rather than correct our previous NVLink Network post, we'll explain here what we've since learned. Our revelation came when NVIDIA published a white paper titled NVIDIA Grace Hopper Superchip Architecture . The figure below from that paper shows the interconnects at the HGX-module level. The GPU (Hopper) side uses NVLink as a coherent interconnect, whereas the CPU (Grace) side uses InfiniBand, in this case connected through a Bluefield-3 DPU. From a networking perspective, the NVLink and InfiniBand domains are independent, that is, there is no bridging between the two protocols. HGX Grace Hopper Superchip System With NVLink Switch (Source: NVIDIA) The new DGX GH200 builds a SuperPOD based on this underlying module-level architecture. You've probably seen the headlin...

White Paper: Scaling Ethernet Fabrics for AI/ML

Image
To train the latest AI models, cloud-service providers are investing in large accelerated-compute clusters. AI  training, however, presents different requirements for network architects than standard compute instances.  Proprietary interconnects have dominated these clusters as the most-performant solutions. Here, we examine  an alternative fabric architecture built around ubiquitous Ethernet technology. Broadcom sponsored the creation  of this white paper, but the opinions and analysis are those of the author. Download the full  white paper  for free, no registration required.

White Paper: The Evolution of Memory Tiering at Scale

Image
With first-generation chips now available, the early hype around CXL is giving way to realistic performance expectations. At the same time, software support for memory tiering is advancing, building on prior work around NUMA and persistent memory. Finally, operators have deployed RDMA to enable storage disaggregation and high-performance workloads. Thanks to these advancements, main-memory disaggregation is now within reach.  Enfabrica sponsored the creation of this white paper, but the opinions and analysis are those of the author. Download the full  white paper  for free, no registration required.

Marvell Teralynx Leapfrogs to 51.2T

Image
About 18 months after acquiring Innovium, Marvell announced its next-generation Teralynx 10 data-center switch chip. The 51.2Tbps design moves the Teralynx architecture to 5nm process technology while also incorporating homegrown 112Gbps PAM4 serdes. It marks the end of the road for the 7nm Teralynx 8 chip, which Innovium sampled but was unable to move to production. Multiple sources confirm that third-party serdes IP was the culprit, and Innovium was only one of several affected customers. The end result for Marvell is that it skipped the 25.6Tbps generation and set its sights on 51.2Tbps. Meanwhile, it has shipped more than 5 million 400GbE ports with the first-generation Teralynx 7, mostly to one hyperscale customer. Teralynx 10 promises an unsurprising feature set, which actually validates Innovium's design approach. The startup found the right balance of programmability and performance, enabling protocol flexibility and P4 in-band networking telemetry (P4-INT). Innovium also t...

CXL Chip Market Poised for Rapid Growth

Image
It's not often that we see a new interconnect come along that's a sure thing. By piggybacking on the PCI Express physical layer, however, CXL has become one of those rare birds. As is always the case with new technologies, it will take time for a multi-vendor ecosystem to mature. CXL offers many incremental steps along the architectural-evolution path, allowing the technology to ramp quickly while offering future iterations that enable truly composable systems. It All Starts with Server CPUs Although not officially launched, Intel's Sapphire Rapids is already shipping to early customers. Development platforms are also in partners' hands, enabling validation and testing of CXL components. AMD's Genoa is also about to launch with CXL support. The caveat for both vendors is that these first CPUs support only CXL 1.1, which lacks important features incorporated in the CXL 2.0 specification. Both versions ride atop PCIe Gen5, however, so the physical layer needn't ch...

NVIDIA Networks NVLink

Image
I attended several sessions at last week's Hot Chips, and NVIDIA's NVSwitch talk was a standout. Ashraf Eassa did a great job of covering the talk's contents in an NVIDIA blog, so I will focus on analysis here.  SuperPOD Bids Adieu to InfiniBand From a system-architecture perspective, the biggest change is extending NVLink beyond a single chassis. NVLink Network is a new protocol built on the NVLink4 link layer. It reuses 400G Ethernet cabling to enable passive-copper (DAC), active-copper (AEC), and optical links. The build its DGX H100 SuperPOD, NVIDIA designed a 1U switch system around a pair of NVSwitch chips. The pod ("scalable unit") includes a central rack with 18 NVLink Switch systems, connecting 32 DGX H100 nodes in a two-level fat-tree topology. This pod interconnect yields 460.8Tbps of bisectional bandwidth. NVLink Network replaces InfiniBand (IB) as the first level of interconnect in DGX SuperPODs. The A100 generation pod uses an IB HDR leaf/spine inte...

Upcoming Events: HC34 & OCP Summit

Image
Just a quick note to say I'll be attending (virtually) Hot Chips  on August 21-23 and plan to be on-site for the OCP Global Summit on October 18-20. Please reach out to schedule meetings and briefings either in advance or at these events.