Posts

Showing posts with the label NVSwitch

Broadcom Adds New Architecture With Tomahawk Ultra

Image
Source: Broadcom Tomahawk Ultra is a misnomer. Although the name leverages Tomahawk's brand equity, Tomahawk Ultra represents a new architecture. In fact, when it began development, Broadcom's competitive target was InfiniBand. During development, however, AI scale-up interconnects emerged as a critical component of performance scaling, particularly for large language models (LLMs). Through luck or foresight, Tomahawk Ultra suddenly had a new and fast-growing target market. Now, the leading competitor was NVIDIA's NVLink. Also happening in parallel, Broadcom built a multi-billion-dollar business in custom AI accelerators for hyperscalers, most notably Google. At the end of April, Broadcom announced its Scale-Up Ethernet (SUE) framework, which it published and contributed to the Open Compute Project (OCP). Appendix A of the framework includes a latency budget, which allocates less than 250ns to the switch. At the time, we saw this as an impossibly low target for existing Eth...

Broadcom Pitches Ethernet for AI Scale Up

Image
Tomahawk 6 is First to 102.4T Through relentless execution, Broadcom has been first to market generation after generation in data-center switching. The company just announced sampling of Tomahawk 6 (TH6), its 102.4T Ethernet switch ASIC. This generation actually consists of two switch chips, TH6-200G with 512x200G SerDes, and TH6-100G with 1,024x100G SerDes, both of which are sampling now. A version with fully co-packaged optics, TH6-Davisson, will follow on a to-be-announced schedule.  Whereas Tomahawk 5 (TH5) is a monolithic 5nm chip, TH6 comprises a core die and separate chiplets for the two SerDes options, all of which use 3nm technology. Source: Broadcom For AI scale-out networks, TH6 enables a 128K-XPU network using only two switch tiers. Fewer tiers mean lower latency, simpler load balancing and congestion control, and fewer optics. The new chip is first to handle 1.6T Ethernet ports, but it also handles up to 512x200GbE ports for maximum radix. Beyond sheer port density, TH...

AI Unsurprisingly Dominates Hot Chips 2024

Image
This year's edition of the annual Hot Chips conference represented the peak in the generative-AI hype cycle. Consistent with the theme, OpenAI's Trevor Cai made the bull case for AI compute in his keynote. At a conference known for technical disclosures, however, the presentations from merchant chip vendors were disappointing; despite a great lineup of talks, few new details emerged. Nvidia's Blackwell presentation mostly rehashed previously disclosed information. In a picture-is-worth-a-thousand-words moment, however, one slide included the photo of the GB200 NVL36 rack shown below. GB200 NVL36 rack (Source: Nvidia) Many customers prefer the NVL36 over the power-hungry NVL72 configuration, which requires a massive 120kW per rack. The key difference for our readers is that the NVLink switch trays shown in the middle of the rack have front-panel cages, whereas the "non-scalable" NVLink switch tray used in the NVL72 has only back-panel connectors for the NVLink spin...

All Eyes on NVIDIA

Image
Aside from CEO Jensen Huang, the DGX GB200 NVL72 was the star of the GTC 2024 keynote. The rackscale system integrates 72 next-generation Blackwell GPUs connected by NVLink to form “1 Giant GPU.” Jensen’s description of the NVLink passive-copper “backplane” caused a brief panic among investors that believed it somehow replaced InfiniBand, which it does not. The NVL72 represents next-generation AI systems, but Nvidia also revealed new details of its deployed Hopper-generation clusters. Next-generation 800G (XDR) InfiniBand won’t reach customers until 2025, so early Blackwell systems will use 400G (NDR) InfiniBand instead. Source: NVIDIA Jensen said the Hopper-generation EOS supercomputer had just come online. This cluster uses 608 NDR switches with 64 ports each for a total of 38,912 switch ports. This system places the leaf switches in racks at the end of the row, so all InfiniBand links employ optical transceivers. We estimate the servers add 5,120 ports for a system total of 44,032 N...

NVIDIA Reveals DGX GH200 System Architecture

Image
We industry analysts sometimes get out over our skis when trying to project details of new products. Following NVIDIA's DGX GH200 announcement at Computex, we noticed industry press making the same mistake. Rather than correct our previous NVLink Network post, we'll explain here what we've since learned. Our revelation came when NVIDIA published a white paper titled NVIDIA Grace Hopper Superchip Architecture . The figure below from that paper shows the interconnects at the HGX-module level. The GPU (Hopper) side uses NVLink as a coherent interconnect, whereas the CPU (Grace) side uses InfiniBand, in this case connected through a Bluefield-3 DPU. From a networking perspective, the NVLink and InfiniBand domains are independent, that is, there is no bridging between the two protocols. HGX Grace Hopper Superchip System With NVLink Switch (Source: NVIDIA) The new DGX GH200 builds a SuperPOD based on this underlying module-level architecture. You've probably seen the headlin...

NVIDIA Networks NVLink

Image
I attended several sessions at last week's Hot Chips, and NVIDIA's NVSwitch talk was a standout. Ashraf Eassa did a great job of covering the talk's contents in an NVIDIA blog, so I will focus on analysis here.  SuperPOD Bids Adieu to InfiniBand From a system-architecture perspective, the biggest change is extending NVLink beyond a single chassis. NVLink Network is a new protocol built on the NVLink4 link layer. It reuses 400G Ethernet cabling to enable passive-copper (DAC), active-copper (AEC), and optical links. The build its DGX H100 SuperPOD, NVIDIA designed a 1U switch system around a pair of NVSwitch chips. The pod ("scalable unit") includes a central rack with 18 NVLink Switch systems, connecting 32 DGX H100 nodes in a two-level fat-tree topology. This pod interconnect yields 460.8Tbps of bisectional bandwidth. NVLink Network replaces InfiniBand (IB) as the first level of interconnect in DGX SuperPODs. The A100 generation pod uses an IB HDR leaf/spine inte...