Decoding Nvidia's Rubin Networking Math

At GTC DC last month, Jensen Huang showed off components of the Vera Rubin NVL144 platform. First, here's the latest roadmap, which now includes BlueField-4 and BlueField-5. For more on that, see BWR Episode 4

Source: Nvidia

Below is the Vera Rubin compute tray, which includes four Rubin GPUs. By GPU, we mean package not die. Note that the Blackwell NVL72 and Rubin NVL144 both have 72 GPU packages, but the NVL144 moniker denotes Nvidia's new math counting die. The company didn't rename the Blackwell configuration, even though that GPU also has two die. Each compute tray has two Vera CPUs, which are 88-core Arm processors. Two GPUs connect with one CPU using NVLink-C2C, a coherent variant of NVLink. Although the roadmap above shows CX9 as 1600G, each ConnectX-9 is actually 800Gbps, requiring eight chips to deliver the aggregate 800GB/s quoted for the tray. That means each GPU has a pair of 800G Ethernet/InfiniBand NICs for scale-out networking. Finally, a single BlueField-4 provides the front-end network, connecting with the two Vera CPUs.

Source: Nvidia with overlay by Wheeler's Network

The first Rubin generation includes the NVLink 6 Switch, shown as 3,600GB/s. This figure, however, is actually the GPU's NVLink bandwidth, not the bandwidth of the switch ASIC.  Rubin has twice the NVLink bandwidth per package compared with Blackwell. So far, so good, everything matches the advertised 260TB/s of scale-up bandwidth for the NVL144 (72 x 3,600GB/s).

Below is the new NVLink switch tray, which clearly shows four NVLink ASICs. By comparison, the NVL72 switch tray uses two NVLink ASICs, so the NVLink 6 generation uses twice the ASICs to double the bandwidth. The key design constraint for NVLink 6 is the requirement for backward compatibility with the Oberon rack introduced with Blackwell. More specifically, Nvidia is reusing the spine (or backplane) containing 5,184 passive copper cables.

Source: Nvidia

So how does Rubin double NVLink bandwidth using the same number of conductors? The answer presumably lies in the "NVLink6 400G Custom SerDes." Now, before you take this at face value, we note that Huang's keynote at Computex 2024 shows the NVLink 5 switch ASIC as having "72-Ports 400G SerDes," even though this is actually 144 x 200G SerDes. Somehow dual 200G SerDes morphed into 400G SerDes thanks to some creative marketing.

This time, we believe the so-called 400G SerDes are really 200G SerDes that support simultaneous bi-directional signaling, doubling the effective bandwidth per cable. This approach would maintain the same Nyquist frequency as NVLink 5, keeping the electrical-channel parameters consistent across generations. It would introduce new circuit-design and signal-processing challenges, owing to the need to separate transmit and receive signals and perform echo cancellation. 

It's also possible that these are true 400G SerDes, but driving the existing backplane at such a rate would be extremely challenging. It would almost certainly require higher-order PAM to reduce the Nyquist frequency relative to PAM4 (>100GHz). Whether NVLink 6 uses bi-directional signaling or higher-order modulation, it will be unique when it reaches the market in 2H26. Hopefully, we will get official technology disclosures at GTC next March.



Comments

Popular posts from this blog

NVIDIA Pivots as Networking Stalls

AMD Looks to Infinity for AI Interconnects

White Paper: Xsight Softens the DPU