Posts

Showing posts with the label InfiniBand

Broadcom Adds New Architecture With Tomahawk Ultra

Image
Source: Broadcom Tomahawk Ultra is a misnomer. Although the name leverages Tomahawk's brand equity, Tomahawk Ultra represents a new architecture. In fact, when it began development, Broadcom's competitive target was InfiniBand. During development, however, AI scale-up interconnects emerged as a critical component of performance scaling, particularly for large language models (LLMs). Through luck or foresight, Tomahawk Ultra suddenly had a new and fast-growing target market. Now, the leading competitor was NVIDIA's NVLink. Also happening in parallel, Broadcom built a multi-billion-dollar business in custom AI accelerators for hyperscalers, most notably Google. At the end of April, Broadcom announced its Scale-Up Ethernet (SUE) framework, which it published and contributed to the Open Compute Project (OCP). Appendix A of the framework includes a latency budget, which allocates less than 250ns to the switch. At the time, we saw this as an impossibly low target for existing Eth...

NVIDIA Pivots as Networking Stalls

Image
Yes, $11B in Blackwell revenue is impressive. Yes, Nvidia's data-center revenue grew 93% year over year. Under the surface, however, there's trouble in networking. In the January quarter (Q4 FY25), networking revenue declined 9% year over year and 3% sequentially. In its earnings call, CFO Collette Kress said that Nvidia's networking attach rate was "robust" at more than 75%. Her very next sentence, however, hinted at what's happening underneath that supposed robustness. "We are transitioning from small NVLink8 with InfiniBand to large NVLink72 with Spectrum-X," said Kress. About one year ago, Nvidia positioned InfiniBand for "AI factories" and Spectrum-X for multi-tenant clouds. That positioning collapsed when the company revealed xAI selected Spectrum-X for what is clearly an AI factory. InfiniBand appears to be retreating to its legacy HPC market while Ethernet comes to the fore. Nvidia Data-Center Revenue So how do we square 93% DC grow...

All Eyes on NVIDIA

Image
Aside from CEO Jensen Huang, the DGX GB200 NVL72 was the star of the GTC 2024 keynote. The rackscale system integrates 72 next-generation Blackwell GPUs connected by NVLink to form “1 Giant GPU.” Jensen’s description of the NVLink passive-copper “backplane” caused a brief panic among investors that believed it somehow replaced InfiniBand, which it does not. The NVL72 represents next-generation AI systems, but Nvidia also revealed new details of its deployed Hopper-generation clusters. Next-generation 800G (XDR) InfiniBand won’t reach customers until 2025, so early Blackwell systems will use 400G (NDR) InfiniBand instead. Source: NVIDIA Jensen said the Hopper-generation EOS supercomputer had just come online. This cluster uses 608 NDR switches with 64 ports each for a total of 38,912 switch ports. This system places the leaf switches in racks at the end of the row, so all InfiniBand links employ optical transceivers. We estimate the servers add 5,120 ports for a system total of 44,032 N...

Ultra Ethernet Promises New RDMA Protocol

Image
This week saw the formal launch of the Ultra Ethernet Consortium (UEC), which aims to reinvent Ethernet fabrics for massive-scale AI and HPC deployments. An impressive list of founding members back this ambitious effort: hyperscalers Meta and Microsoft; chip vendors AMD, Broadcom, and Intel; OEMs Arista, Atos, and HPE; and Cisco, which straddles the chip and OEM camps. Absent this backing, we could easily write off this consortium as doomed to failure. Our skepticism is rooted not in the obvious need the UEC looks to serve but rather in the challenges of standardizing and implementing a full-stack approach. The effort plans to replace existing transport protocols as well as user-space APIs. Specifically, the Ultra Ethernet Transport (UET) protocol will be a new RDMA protocol to replace ROCE, and new APIs will replace the Verbs API from the InfiniBand heritage. UET will provide an alternative to RoCEv2 and Amazon’s SRD , both of which are deployed in hyperscale data centers. (Source: Ul...

Spectrum-X: It's Bigger Than Software

Image
There's been a lot of confusion around Spectrum-X, some of which NVIDIA seems to have created intentionally. The company's branding is part of the issue, as it seems to conflate Spectrum-X with the Spectrum line of Ethernet switch chips. In fact, Spectrum-X is simply a software license that enables new features across a set of existing hardware products. The reality that Spectrum-X is a set of software, however, devalues what NVIDIA has actually delivered. Working on top of the company's end-to-end Ethernet hardware, the software creates the first merchant congestion-managed Ethernet fabric. Minimizing tail latency is critical to AI-training workloads, as detailed in our recent white paper . We use the merchant qualifier because some hyperscalers have developed their own congestion-management schemes that work with standard Ethernet-switch hardware. One example is Amazon, which developed the scalable reliable datagram (SDP) protocol for use with its internally-developed Ni...

NVIDIA Reveals DGX GH200 System Architecture

Image
We industry analysts sometimes get out over our skis when trying to project details of new products. Following NVIDIA's DGX GH200 announcement at Computex, we noticed industry press making the same mistake. Rather than correct our previous NVLink Network post, we'll explain here what we've since learned. Our revelation came when NVIDIA published a white paper titled NVIDIA Grace Hopper Superchip Architecture . The figure below from that paper shows the interconnects at the HGX-module level. The GPU (Hopper) side uses NVLink as a coherent interconnect, whereas the CPU (Grace) side uses InfiniBand, in this case connected through a Bluefield-3 DPU. From a networking perspective, the NVLink and InfiniBand domains are independent, that is, there is no bridging between the two protocols. HGX Grace Hopper Superchip System With NVLink Switch (Source: NVIDIA) The new DGX GH200 builds a SuperPOD based on this underlying module-level architecture. You've probably seen the headlin...

NVIDIA Networks NVLink

Image
I attended several sessions at last week's Hot Chips, and NVIDIA's NVSwitch talk was a standout. Ashraf Eassa did a great job of covering the talk's contents in an NVIDIA blog, so I will focus on analysis here.  SuperPOD Bids Adieu to InfiniBand From a system-architecture perspective, the biggest change is extending NVLink beyond a single chassis. NVLink Network is a new protocol built on the NVLink4 link layer. It reuses 400G Ethernet cabling to enable passive-copper (DAC), active-copper (AEC), and optical links. The build its DGX H100 SuperPOD, NVIDIA designed a 1U switch system around a pair of NVSwitch chips. The pod ("scalable unit") includes a central rack with 18 NVLink Switch systems, connecting 32 DGX H100 nodes in a two-level fat-tree topology. This pod interconnect yields 460.8Tbps of bisectional bandwidth. NVLink Network replaces InfiniBand (IB) as the first level of interconnect in DGX SuperPODs. The A100 generation pod uses an IB HDR leaf/spine inte...