BWR Episode 12: It's All About the DRAM
As I expressed at the end of this episode, memory is a world unto itself. A big thanks to Jim Handy for helping us understand what's going on with DRAM, and to MEXT for being our first sponsor. Bringing an AI engine into memory tiering is a major leap over existing techniques like TPP. Even better is the fact that MEXT's approach requires no new hardware, unlike CXL-based memory expansion.
Jim Handy explains why HBM consumes 3x the wafers of equivalent DDR bits, which is known as the trade ratio. Joe and I also discuss TurboQuant and what's happening with AI quantization. An important distinction is that TurboQuant addresses the KV cache, not model weights. Its first application will be for on-device inference, meaning it doesn't take any pressure off HBM demand in the near term.
Comments
Post a Comment