The high-performance computing (HPC) memory wall generally refers to the growing disparity between processor speed and memory bandwidth. When processor performance outpaces memory access speeds, this creates a bottleneck in overall system performance, particularly in memory-intensive applications like artificial intelligence (AI).
This article begins by exploring the conventional definition of the memory wall and then looks at an alternative view that compares memory capacity with the growth in the number of parameters in AI models. By either definition, the memory wall has arrived, and it is a serious problem. It closes with a look at some techniques for climbing over the wall or at least reducing its height.
Of course, the definition of HPC is evolving. What was considered HPC several years ago no longer meets the latest definition. Based on the comparison of processor performance in peak floating-point operations per second (FLOPs) versus memory bandwidth, it’s been a problem for over 25 years (Figure 1). While memory performance has improved significantly, the ability to access and transfer data has not kept pace with the ability of data processors.

As a result of the memory wall, the processor spends more and more time waiting on the memory. That means some of the capabilities of expensive high-performance processors can’t be used. In HPC applications involving large databases and complex computations, this can be a serious problem.
The AI perspective
HPC is an important tool for AI, especially for training AI models. When AI emerged in about 2015, the number of parameters in a typical model was relatively modest. It didn’t need the maximum HPC performance, so the memory wall experienced with other applications wasn’t a concern.
That changed in about 2019 because of the rapid increase in AI model complexity that outstripped processor performance increases (Figure 2). In the ensuing years, the height of the memory wall for AI applications has continued to grow and can be a limiting factor for further advances in AI performance. AI’s growing importance has increased the urgency for dealing with the HPC memory wall.

Reducing the wall height
As shown in Figure 1, above, multiple generations of graphics double data rate (GDDR) and high bandwidth memory (HBM) technologies have only slowed the growing height of the memory wall but have not solved the problem.
Several memory management approaches are also used, including multi-level hierarchical caching, where frequently used data is stored closer to the processor, and prefetch instructions to improve performance by reducing the need to access main memory.
Optimized algorithms that maximize memory use can also help to mitigate the effects of the memory wall. Structuring data for more efficient use can minimize cache misses and improve performance.
Rather than improvements in raw memory performance, recent developments have focused on new computing and memory architectures to scale the wall.
Examples of new architectural approaches include compute-in-memory (CIM), also called processing-in-memory (PIM), and in-memory computing (IMC). CIM is a hardware-based architecture that performs computations directly within the memory storage. That reduces the need for data transfers and speeds computing.
IMC is a hardware and software approach. Data is processed in RAM to improve performance and can utilize multiple cores and parallel processing. CIM and IMC can benefit from the Compute extension for Local memory (CXL) standard.
Conquering the wall with CXL
CXL-attached memory addresses the HPC memory wall by enabling efficient memory sharing and expanding the memory capacity and bandwidth available to multiple processors. It leverages the PCIe physical layer to provide low-latency and high-bandwidth communication, facilitating efficient data transfer between the CPU and attached memory.
CXL ensures that memory accesses are coherent and that all processors have a consistent view of the memory, simplifying memory management. It provides the structure and tools for more efficient memory use, helping HPC systems conquer the memory wall challenge.
Summary
While the HPC memory wall typically refers to the growing disparity between increasing processor speed and lagging memory bandwidths, it can also be defined relative to the growing complexity of AI models. By either definition, it’s growing and is an increasingly serious challenge. Designers have several tools available for scaling or reducing the height of the HPC memory wall.
References
Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning, arXiv
Breaking Through the Memory Wall, Astera Labs
High Performance Computing, Komprise
Memory Sharing with CXL: Hardware and Software Design Approaches, arXiv
RoCE Technology for HPC- Test data & Practical Implementations on SONiC switch, Asterfusion Data Technologies
Tearing Down the Memory Wall, Juniper Networks
What is high-performance computing (HPC)?, IBM
What is the memory wall?, Ayar Labs
EEWorld Online related content
How twin axial cable assemblies support high-performance computing for AI/ML systems
What interconnects are used with memory for HPC and AI?
How does UCIe on chiplets enable optical interconnects in data centers?
How does the open domain-specific architecture relate to chiplets and generative AI?
How do heterogeneous integration and chiplets support generative AI?
Leave a Reply