Posts

White Paper: Xsight Softens the DPU

Image
Powering SmartNICs, the data-processing unit (DPU) has become nearly ubiquitous in the leading public clouds. Existing designs maximize power efficiency for a constrained feature set, and they require proprietary software tools. Xsight Labs aims to break this paradigm with its new E1 DPU, which promises the openness of an Arm server CPU. Xsight Labs sponsored the creation of this white paper, but the opinions and analysis are those of the author. Download the white paper for free, no registration required. Xsight E1 DPU

White Paper: Xsight Recharges the Cloud ToR

Image
Cloud-datacenter operators are driving rapid adoption of 800Gbps optical modules while also upgrading compute-server NICs to 400Gbps speeds. The 51.2Tbps switch chips designed for these network fabrics, however, deliver too much capacity for top-of-rack switch systems. With its X2, Xsight Labs developed a unique chip aimed at optimizing compute racks by enabling 100Gbps-per-lane server links and 800Gbps uplink optics. Xsight Labs sponsored the creation of this white paper, but the opinions and analysis are those of the author. Download the white paper for free, no registration required. Xsight X2

AI Unsurprisingly Dominates Hot Chips 2024

Image
This year's edition of the annual Hot Chips conference represented the peak in the generative-AI hype cycle. Consistent with the theme, OpenAI's Trevor Cai made the bull case for AI compute in his keynote. At a conference known for technical disclosures, however, the presentations from merchant chip vendors were disappointing; despite a great lineup of talks, few new details emerged. Nvidia's Blackwell presentation mostly rehashed previously disclosed information. In a picture-is-worth-a-thousand-words moment, however, one slide included the photo of the GB200 NVL36 rack shown below. GB200 NVL36 rack (Source: Nvidia) Many customers prefer the NVL36 over the power-hungry NVL72 configuration, which requires a massive 120kW per rack. The key difference for our readers is that the NVLink switch trays shown in the middle of the rack have front-panel cages, whereas the "non-scalable" NVLink switch tray used in the NVL72 has only back-panel connectors for the NVLink spin...

NVIDIA Reveals Roadmap at Computex

Image
The annual Computex trade show in Taipei has traditionally been PC-centric, with ODMs showing their latest motherboards and systems. The 2024 event, however, included keynotes from Nvidia and others that revealed details of forthcoming datacenter GPUs, demonstrating the importance of the ODM ecosystem to the explosion of AI. The fact that Jensen Huang was born on the island made his keynote all the more impactful for the local audience. In the week following the CEO's keynote, Nvidia's market capitalization surpassed $3 trillion. From a networking perspective, the keynote focused on Ethernet rather than InfiniBand, as the former is a better fit in the ecosystem messaging. Source: NVIDIA The datacenter section of Jensen's talk largely reminded the audience of what Nvidia announced at GTC in March. The Blackwell GPU, now in production, introduces NVLink5, which operates at 200Gbps per lane. It includes 18 NVLink ports with two lanes each, or 36x200Gbps serdes. The new NVLink...

PAM4 DSPs Battle LPO for OFC Mindshare

Image
Last year, module vendors demonstrated the first 1.6T optical modules, and this year DSP vendors looked ahead to second-generation 1.6T module designs. Whereas the first 1.6T modules connect a 16x100G host interface to 8x200G optics (16:8), next-generation designs will work with forthcoming 200G/lane switch ASICs, as shown in the top row of the figure. Broadcom disclosed its Sian2 1.6T 8:8 DSP at a March investor event, and Marvell followed by announcing its similar Nova 2 at OFC. Not wanting to be left out of the 1.6T landscape, MaxLinear pre-announced Rushmore, which similarly targets 8:8 designs. Although the company withheld product details, it disclosed Samsung Foundry as its manufacturing partner for Rushmore, setting it apart from competitors using TSMC. Source: Broadcom Progress on linear pluggable optics (LPO) and other less-than-full-DSP variants was evident at 100G/lane, but vendors also set the stage for 200G/lane. Last November, Credo Semiconductor was first to announc...

All Eyes on NVIDIA

Image
Aside from CEO Jensen Huang, the DGX GB200 NVL72 was the star of the GTC 2024 keynote. The rackscale system integrates 72 next-generation Blackwell GPUs connected by NVLink to form “1 Giant GPU.” Jensen’s description of the NVLink passive-copper “backplane” caused a brief panic among investors that believed it somehow replaced InfiniBand, which it does not. The NVL72 represents next-generation AI systems, but Nvidia also revealed new details of its deployed Hopper-generation clusters. Next-generation 800G (XDR) InfiniBand won’t reach customers until 2025, so early Blackwell systems will use 400G (NDR) InfiniBand instead. Source: NVIDIA Jensen said the Hopper-generation EOS supercomputer had just come online. This cluster uses 608 NDR switches with 64 ports each for a total of 38,912 switch ports. This system places the leaf switches in racks at the end of the row, so all InfiniBand links employ optical transceivers. We estimate the servers add 5,120 ports for a system total of 44,032 N...

AMD Looks to Infinity for AI Interconnects

Image
With the formal launch of the MI300 GPU, AMD revealed new plans for scaling the multi-GPU interconnects vital to AI-training performance. The company's approach relies on a partner ecosystem, which stands in stark contrast with NVIDIA's end-to-end solutions. The plans revolve around AMD's proprietary Infinity Fabric and its underlying XGMI interconnect. Infinity Fabric Adopts Switching As with its prior generation, AMD uses XGMI to connect multiple MI300 GPUs in what it calls a hive. The hive shares a homogeneous memory space formed by the HBM attached to each GPU. In current designs, the GPUs connect directly using XGMI in a mesh or ring topology. Each MI300X GPU has up to seven Infinity Fabric links, each with 16 lanes. The 4th-gen Infinity Fabric supports up to 32Gbps per lane, yielding 128GB/s of bidirectional bandwidth per link. At the MI300 launch, Broadcom announced that its next-generation PCI Express (PCIe) switch chip will add support for XGMI. At last October...