Posts

PAM4 DSPs Battle LPO for OFC Mindshare

Image
Last year, module vendors demonstrated the first 1.6T optical modules, and this year DSP vendors looked ahead to second-generation 1.6T module designs. Whereas the first 1.6T modules connect a 16x100G host interface to 8x200G optics (16:8), next-generation designs will work with forthcoming 200G/lane switch ASICs, as shown in the top row of the figure. Broadcom disclosed its Sian2 1.6T 8:8 DSP at a March investor event, and Marvell followed by announcing its similar Nova 2 at OFC. Not wanting to be left out of the 1.6T landscape, MaxLinear pre-announced Rushmore, which similarly targets 8:8 designs. Although the company withheld product details, it disclosed Samsung Foundry as its manufacturing partner for Rushmore, setting it apart from competitors using TSMC. Source: Broadcom Progress on linear pluggable optics (LPO) and other less-than-full-DSP variants was evident at 100G/lane, but vendors also set the stage for 200G/lane. Last November, Credo Semiconductor was first to announc

All Eyes on NVIDIA

Image
Aside from CEO Jensen Huang, the DGX GB200 NVL72 was the star of the GTC 2024 keynote. The rackscale system integrates 72 next-generation Blackwell GPUs connected by NVLink to form “1 Giant GPU.” Jensen’s description of the NVLink passive-copper “backplane” caused a brief panic among investors that believed it somehow replaced InfiniBand, which it does not. The NVL72 represents next-generation AI systems, but Nvidia also revealed new details of its deployed Hopper-generation clusters. Next-generation 800G (XDR) InfiniBand won’t reach customers until 2025, so early Blackwell systems will use 400G (NDR) InfiniBand instead. Source: NVIDIA Jensen said the Hopper-generation EOS supercomputer had just come online. This cluster uses 608 NDR switches with 64 ports each for a total of 38,912 switch ports. This system places the leaf switches in racks at the end of the row, so all InfiniBand links employ optical transceivers. We estimate the servers add 5,120 ports for a system total of 44,032 N

AMD Looks to Infinity for AI Interconnects

Image
With the formal launch of the MI300 GPU, AMD revealed new plans for scaling the multi-GPU interconnects vital to AI-training performance. The company's approach relies on a partner ecosystem, which stands in stark contrast with NVIDIA's end-to-end solutions. The plans revolve around AMD's proprietary Infinity Fabric and its underlying XGMI interconnect. Infinity Fabric Adopts Switching As with its prior generation, AMD uses XGMI to connect multiple MI300 GPUs in what it calls a hive. The hive shares a homogeneous memory space formed by the HBM attached to each GPU. In current designs, the GPUs connect directly using XGMI in a mesh or ring topology. Each MI300X GPU has up to seven Infinity Fabric links, each with 16 lanes. The 4th-gen Infinity Fabric supports up to 32Gbps per lane, yielding 128GB/s of bidirectional bandwidth per link. At the MI300 launch, Broadcom announced that its next-generation PCI Express (PCIe) switch chip will add support for XGMI. At last October'

White Paper: Broadcom's Amazing Shrinking Router

Image
Communications-service providers are under pressure to transform their networks for greater bandwidth and efficiency while also enabling new revenue-generating services. In some cases, they have disrupted the traditional supply chain, turning to white-box designs powered by merchant silicon. Broadcom has been at the forefront of this trend with its DNX line of Ethernet chips, better known by the Jericho and Qumran code names. Here, we look at the migration to fixed-configuration systems and how Broadcom’s 5nm Qumran3D chip can serve in the next router generation. Broadcom sponsored the creation of this white paper, but the opinions and analysis are those of the author. Download the white paper for free, no registration required. Qumran3D

Optics Grab Attention at Hot Chips 2023

Image
August marked the in-person return of the   Hot Chips   conference at Stanford University in California, and the sold-out 35th edition included plenty of deep technical content. AI/ML garnered lots of attention and optical interconnects were featured in both chip- and system-level AI and HPC talks.   NVIDIA’s   chief scientist, Bill Dally, keynoted Day 2 with a talk reviewing how accelerators achieved a 1,000x performance increase over the last 10 years. His big-picture view provided excellent context for AI-system design, but networking received only an honorable mention this year. Instead, Dally discussed future directions for accelerated compute. Following the keynote, an  ML-Training  session presented talks from Google and Cerebras. The technical lead for TPUs at  Google , Norm Jouppi made it clear he could only discuss the n-1 generation, meaning TPUv4. Meanwhile, Google revealed the TPUv5e at its own Google Cloud Next event the same day but provided only high-level specification

Preview: Hot Chips Returns to Stanford for HC35

Image
I will be at Stanford at the end of this month for the in-person return of Hot Chips . As always, the 35th edition (HC35) will have plenty of deep technical content, with AI/ML unsurprisingly getting lots of attention. I'm particularly interested in a set of talks exploring interconnects and networking for AI, HPC, and beyond. Day 2 (Tuesday, August 29) features an ML-Training session with talks from Google and Cerebras. The technical lead for TPUs at Google , Norm Jouppi will expand on the paper presented at ISCA 2023 describing the TPUv4 supercomputer. That paper revealed Google's use of optical circuit switches in its TPUv4 cluster, following prior disclosures around OCS deployments in its data-center spine layer. Sean Li, cofounder and chief hardware architect at Cerebras , will deliver a talk on the company's cluster architecture built around the CS-2 system and WSE-2 wafer-scale engine. This talk will explore how the MemoryX external-memory system and SwarmX fabric i

Cisco Joins 51.2T Switch-Chip Crowd

Image
Four chip vendors may not sound like a crowd, but in the leading-edge data-center switch segment, it's likely unsustainable over the long term. The problem is the small number of customers for these devices, that is, the hyperscalers. Despite stiff competition, however, Cisco continues to invest in its Silicon One product line, which now includes 5nm switch chips. The new top-end device is the Silicon One G200, a 51.2Tbps chip built around an internally developed 112Gbps serdes. As a refresher, the company announced production of its 7nm 25.6Tbps G100 chip last October along with design wins in new Cisco platforms. As the figure below shows, Cisco makes some bold claims regarding the G200. The most startling is the statement that the G200 is twice as power efficient as its predecessor. In other words, the G200 dissipates the same power as the G100 at twice the throughput. The company's new serdes design must improve power efficiency, as the move from 7nm to 5nm should account f