• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Microcontroller Tips

Microcontroller engineering resources, new microcontroller products and electronics engineering news

  • Products
    • 8-bit
    • 16-bit
    • 32-bit
    • 64-bit
  • Applications
    • 5G
    • Automotive
    • Connectivity
    • Consumer Electronics
    • EV Engineering
    • Industrial
    • IoT
    • Medical
    • Security
    • Telecommunications
    • Wearables
    • Wireless
  • Learn
    • eBooks / Tech Tips
    • EE Training Days
    • FAQs
    • Learning Center
    • Tech Toolboxes
    • Webinars/Digital Events
  • Resources
    • Design Guide Library
    • DesignFast
    • LEAP Awards
    • Podcasts
    • White Papers
  • Videos
    • EE Videos & Interviews
    • Teardown Videos
  • EE Forums
    • EDABoard.com
    • Electro-Tech-Online.com
  • Engineering Training Days
  • Advertise
  • Subscribe

What is the HPC memory wall and how can you climb over it?

May 7, 2025 By Jeff Shepard Leave a Comment

The high-performance computing (HPC) memory wall generally refers to the growing disparity between processor speed and memory bandwidth. When processor performance outpaces memory access speeds, this creates a bottleneck in overall system performance, particularly in memory-intensive applications like artificial intelligence (AI).

This article begins by exploring the conventional definition of the memory wall and then looks at an alternative view that compares memory capacity with the growth in the number of parameters in AI models. By either definition, the memory wall has arrived, and it is a serious problem. It closes with a look at some techniques for climbing over the wall or at least reducing its height.

Of course, the definition of HPC is evolving. What was considered HPC several years ago no longer meets the latest definition. Based on the comparison of processor performance in peak floating-point operations per second (FLOPs) versus memory bandwidth, it’s been a problem for over 25 years (Figure 1). While memory performance has improved significantly, the ability to access and transfer data has not kept pace with the ability of data processors.

Figure 1. The HPC memory wall is the gap between processor performance and memory bandwidth. (Image: Astera Labs)

As a result of the memory wall, the processor spends more and more time waiting on the memory. That means some of the capabilities of expensive high-performance processors can’t be used. In HPC applications involving large databases and complex computations, this can be a serious problem.

The AI perspective

HPC is an important tool for AI, especially for training AI models. When AI emerged in about 2015, the number of parameters in a typical model was relatively modest. It didn’t need the maximum HPC performance, so the memory wall experienced with other applications wasn’t a concern.

That changed in about 2019 because of the rapid increase in AI model complexity that outstripped processor performance increases (Figure 2). In the ensuing years, the height of the memory wall for AI applications has continued to grow and can be a limiting factor for further advances in AI performance. AI’s growing importance has increased the urgency for dealing with the HPC memory wall.

Figure 2. The HPC memory wall can also be viewed in terms of the increasing complexity of AI models. (Image: Ayar Labs)

Reducing the wall height

As shown in Figure 1, above, multiple generations of graphics double data rate (GDDR) and high bandwidth memory (HBM) technologies have only slowed the growing height of the memory wall but have not solved the problem.

Several memory management approaches are also used, including multi-level hierarchical caching, where frequently used data is stored closer to the processor, and prefetch instructions to improve performance by reducing the need to access main memory.

Optimized algorithms that maximize memory use can also help to mitigate the effects of the memory wall. Structuring data for more efficient use can minimize cache misses and improve performance.  

Rather than improvements in raw memory performance, recent developments have focused on new computing and memory architectures to scale the wall.

Examples of new architectural approaches include compute-in-memory (CIM), also called processing-in-memory (PIM), and in-memory computing (IMC). CIM is a hardware-based architecture that performs computations directly within the memory storage. That reduces the need for data transfers and speeds computing.

IMC is a hardware and software approach. Data is processed in RAM to improve performance and can utilize multiple cores and parallel processing. CIM and IMC can benefit from the Compute extension for Local memory (CXL) standard.

Conquering the wall with CXL

CXL-attached memory addresses the HPC memory wall by enabling efficient memory sharing and expanding the memory capacity and bandwidth available to multiple processors. It leverages the PCIe physical layer to provide low-latency and high-bandwidth communication, facilitating efficient data transfer between the CPU and attached memory.

CXL ensures that memory accesses are coherent and that all processors have a consistent view of the memory, simplifying memory management. It provides the structure and tools for more efficient memory use, helping HPC systems conquer the memory wall challenge.

Summary

While the HPC memory wall typically refers to the growing disparity between increasing processor speed and lagging memory bandwidths, it can also be defined relative to the growing complexity of AI models. By either definition, it’s growing and is an increasingly serious challenge. Designers have several tools available for scaling or reducing the height of the HPC memory wall.

References

Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning, arXiv
Breaking Through the Memory Wall, Astera Labs
High Performance Computing, Komprise
Memory Sharing with CXL: Hardware and Software Design Approaches, arXiv
RoCE Technology for HPC- Test data & Practical Implementations on SONiC switch, Asterfusion Data Technologies
Tearing Down the Memory Wall, Juniper Networks
What is high-performance computing (HPC)?, IBM
What is the memory wall?, Ayar Labs

EEWorld Online related content

How twin axial cable assemblies support high-performance computing for AI/ML systems
What interconnects are used with memory for HPC and AI?
How does UCIe on chiplets enable optical interconnects in data centers?
How does the open domain-specific architecture relate to chiplets and generative AI?
How do heterogeneous integration and chiplets support generative AI?

You may also like:


  • What are the different MLPerf benchmarks from MLCommons?

  • How can in-package optical interconnects enhance chiplet generative AI performance?

  • How do UCIe and BoW interconnects support generative AI on…

  • What is the heterogeneous integration roadmap, and how does it…

  • How does the open domain-specific architecture relate to chiplets and…

Filed Under: Artificial intelligence, FAQ, Featured Tagged With: FAQ

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Primary Sidebar

Featured Contributions

Five challenges for developing next-generation ADAS and autonomous vehicles

Securing IoT devices against quantum computing risks

RISC-V implementation strategies for certification of safety-critical systems

What’s new with Matter: how Matter 1.4 is reshaping interoperability and energy management

Edge AI: Revolutionizing real-time data processing and automation

More Featured Contributions

EE TECH TOOLBOX

“ee
Tech Toolbox: Internet of Things
Explore practical strategies for minimizing attack surfaces, managing memory efficiently, and securing firmware. Download now to ensure your IoT implementations remain secure, efficient, and future-ready.

EE Learning Center

EE Learning Center

EE ENGINEERING TRAINING DAYS

engineering
“bills
“microcontroller
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, tools and strategies for EE professionals.

RSS Current EDABoard.com discussions

  • What is the purpose of the diode from gate to GND in normal Colpitts oscillator Circuits?
  • Antiparallel Schottky Diodes VDI-Load Pull
  • interfacing gsm and gps in proteus
  • 12VAC to 12VDC 5A on 250ft 12AWG
  • My array have wrong radiation pattern

RSS Current Electro-Tech-Online.com Discussions

  • how to work on pcbs that are thick
  • Actin group needed for effective PCB software tutorials
  • Kawai KDP 80 Electronic Piano Dead
  • Doing consultancy work and the Tax situation?
  • How to repair this plug in connector where wires came loose

DesignFast

Design Fast Logo
Component Selection Made Simple.

Try it Today
design fast globle

Footer

Microcontroller Tips

EE World Online Network

  • 5G Technology World
  • EE World Online
  • Engineers Garage
  • Analog IC Tips
  • Battery Power Tips
  • Connector Tips
  • DesignFast
  • EDA Board Forums
  • Electro Tech Online Forums
  • EV Engineering
  • Power Electronic Tips
  • Sensor Tips
  • Test and Measurement Tips

Microcontroller Tips

  • Subscribe to our newsletter
  • Advertise with us
  • Contact us
  • About us

Copyright © 2025 · WTWH Media LLC and its licensors. All rights reserved.
The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media.

Privacy Policy