BWR Episode 12: It's All About the DRAM

As I expressed at the end of this episode, memory is a world unto itself. A big thanks to Jim Handy for helping us understand what's going on with DRAM, and to MEXT for being our first sponsor. Bringing an AI engine into memory tiering is a major leap over existing techniques like TPP. Even better is the fact that MEXT's approach requires no new hardware, unlike CXL-based memory expansion.


Jim Handy explains why HBM consumes 3x the wafers of equivalent DDR bits, which is known as the trade ratio. Joe and I also discuss TurboQuant and what's happening with AI quantization. An important distinction is that TurboQuant addresses the KV cache, not model weights. Its first application will be for on-device inference, meaning it doesn't take any pressure off HBM demand in the near term.

Comments

Popular posts from this blog

AMD Looks to Infinity for AI Interconnects

Decoding Nvidia's Rubin Networking Math

NVIDIA Pivots as Networking Stalls