Posts

AI Unsurprisingly Dominates Hot Chips 2024

Image
This year's edition of the annual Hot Chips conference represented the peak in the generative-AI hype cycle. Consistent with the theme, OpenAI's Trevor Cai made the bull case for AI compute in his keynote. At a conference known for technical disclosures, however, the presentations from merchant chip vendors were disappointing; despite a great lineup of talks, few new details emerged. Nvidia's Blackwell presentation mostly rehashed previously disclosed information. In a picture-is-worth-a-thousand-words moment, however, one slide included the photo of the GB200 NVL36 rack shown below. GB200 NVL36 rack (Source: Nvidia) Many customers prefer the NVL36 over the power-hungry NVL72 configuration, which requires a massive 120kW per rack. The key difference for our readers is that the NVLink switch trays shown in the middle of the rack have front-panel cages, whereas the "non-scalable" NVLink switch tray used in the NVL72 has only back-panel connectors for the NVLink spin

NVIDIA Reveals Roadmap at Computex

Image
The annual Computex trade show in Taipei has traditionally been PC-centric, with ODMs showing their latest motherboards and systems. The 2024 event, however, included keynotes from Nvidia and others that revealed details of forthcoming datacenter GPUs, demonstrating the importance of the ODM ecosystem to the explosion of AI. The fact that Jensen Huang was born on the island made his keynote all the more impactful for the local audience. In the week following the CEO's keynote, Nvidia's market capitalization surpassed $3 trillion. From a networking perspective, the keynote focused on Ethernet rather than InfiniBand, as the former is a better fit in the ecosystem messaging. Source: NVIDIA The datacenter section of Jensen's talk largely reminded the audience of what Nvidia announced at GTC in March. The Blackwell GPU, now in production, introduces NVLink5, which operates at 200Gbps per lane. It includes 18 NVLink ports with two lanes each, or 36x200Gbps serdes. The new NVLink

PAM4 DSPs Battle LPO for OFC Mindshare

Image
Last year, module vendors demonstrated the first 1.6T optical modules, and this year DSP vendors looked ahead to second-generation 1.6T module designs. Whereas the first 1.6T modules connect a 16x100G host interface to 8x200G optics (16:8), next-generation designs will work with forthcoming 200G/lane switch ASICs, as shown in the top row of the figure. Broadcom disclosed its Sian2 1.6T 8:8 DSP at a March investor event, and Marvell followed by announcing its similar Nova 2 at OFC. Not wanting to be left out of the 1.6T landscape, MaxLinear pre-announced Rushmore, which similarly targets 8:8 designs. Although the company withheld product details, it disclosed Samsung Foundry as its manufacturing partner for Rushmore, setting it apart from competitors using TSMC. Source: Broadcom Progress on linear pluggable optics (LPO) and other less-than-full-DSP variants was evident at 100G/lane, but vendors also set the stage for 200G/lane. Last November, Credo Semiconductor was first to announc

All Eyes on NVIDIA

Image
Aside from CEO Jensen Huang, the DGX GB200 NVL72 was the star of the GTC 2024 keynote. The rackscale system integrates 72 next-generation Blackwell GPUs connected by NVLink to form “1 Giant GPU.” Jensen’s description of the NVLink passive-copper “backplane” caused a brief panic among investors that believed it somehow replaced InfiniBand, which it does not. The NVL72 represents next-generation AI systems, but Nvidia also revealed new details of its deployed Hopper-generation clusters. Next-generation 800G (XDR) InfiniBand won’t reach customers until 2025, so early Blackwell systems will use 400G (NDR) InfiniBand instead. Source: NVIDIA Jensen said the Hopper-generation EOS supercomputer had just come online. This cluster uses 608 NDR switches with 64 ports each for a total of 38,912 switch ports. This system places the leaf switches in racks at the end of the row, so all InfiniBand links employ optical transceivers. We estimate the servers add 5,120 ports for a system total of 44,032 N

AMD Looks to Infinity for AI Interconnects

Image
With the formal launch of the MI300 GPU, AMD revealed new plans for scaling the multi-GPU interconnects vital to AI-training performance. The company's approach relies on a partner ecosystem, which stands in stark contrast with NVIDIA's end-to-end solutions. The plans revolve around AMD's proprietary Infinity Fabric and its underlying XGMI interconnect. Infinity Fabric Adopts Switching As with its prior generation, AMD uses XGMI to connect multiple MI300 GPUs in what it calls a hive. The hive shares a homogeneous memory space formed by the HBM attached to each GPU. In current designs, the GPUs connect directly using XGMI in a mesh or ring topology. Each MI300X GPU has up to seven Infinity Fabric links, each with 16 lanes. The 4th-gen Infinity Fabric supports up to 32Gbps per lane, yielding 128GB/s of bidirectional bandwidth per link. At the MI300 launch, Broadcom announced that its next-generation PCI Express (PCIe) switch chip will add support for XGMI. At last October'

White Paper: Broadcom's Amazing Shrinking Router

Image
Communications-service providers are under pressure to transform their networks for greater bandwidth and efficiency while also enabling new revenue-generating services. In some cases, they have disrupted the traditional supply chain, turning to white-box designs powered by merchant silicon. Broadcom has been at the forefront of this trend with its DNX line of Ethernet chips, better known by the Jericho and Qumran code names. Here, we look at the migration to fixed-configuration systems and how Broadcom’s 5nm Qumran3D chip can serve in the next router generation. Broadcom sponsored the creation of this white paper, but the opinions and analysis are those of the author. Download the white paper for free, no registration required. Qumran3D

Optics Grab Attention at Hot Chips 2023

Image
August marked the in-person return of the   Hot Chips   conference at Stanford University in California, and the sold-out 35th edition included plenty of deep technical content. AI/ML garnered lots of attention and optical interconnects were featured in both chip- and system-level AI and HPC talks.   NVIDIA’s   chief scientist, Bill Dally, keynoted Day 2 with a talk reviewing how accelerators achieved a 1,000x performance increase over the last 10 years. His big-picture view provided excellent context for AI-system design, but networking received only an honorable mention this year. Instead, Dally discussed future directions for accelerated compute. Following the keynote, an  ML-Training  session presented talks from Google and Cerebras. The technical lead for TPUs at  Google , Norm Jouppi made it clear he could only discuss the n-1 generation, meaning TPUv4. Meanwhile, Google revealed the TPUv5e at its own Google Cloud Next event the same day but provided only high-level specification