If you are running AI workloads, here is something that might surprise you. Your processors are wasting more energy shuffling data around than actually doing the calculations you care about. This inefficiency is becoming a serious limit for the next generation of artificial intelligence systems. As neural networks grow to billions of parameters, traditional von Neumann architectures are hitting physical barriers.
This article explains what compute-in-memory (CIM) technology is and how it works. We will examine how current implementations are already delivering significantly better efficiency improvements compared to conventional processors. We will also explore why this new approach could change AI computing.
Challenges with traditional computers
Traditional computers keep computational units and memory systems separate. They constantly exchange data through energy-intensive transfers. Early proposals like Terasys, IRAM, and FlexRAM emerged in the 1990s. However, these initial attempts had major limitations. The CMOS technology at the time wasn’t advanced enough. Application demands were also different.
The traditional von Neumann architecture (Figure 1a) maintains a strict separation between the central processing unit and memory. This approach requires constant data transfers across a bandwidth-limited bus. This separation creates the “memory wall” problem, which particularly hurts AI workloads.

Understanding compute-in-memory
CIM, also known as processing-in-memory, is very different from the traditional von Neumann architecture that has dominated computing for decades. It performs computations directly within or very close to where the data is stored.
Near-memory computing (Figure 1b) brings memory closer to processing units. True in-memory computing approaches (Figures 1c and 1d) work differently. They embed computational capabilities directly within memory arrays. This integration of storage and logic units reduces data movement. This decreases both latency and energy consumption, which are the two major bottlenecks in modern AI applications.
The rapid growth of big data and machine learning applications has driven the rise of CIM. These applications demand high computational efficiency.
Technical implementation approaches
CIM can be implemented using various memory technologies, each offering distinct advantages for different AI workloads.
Static Random-Access Memory (SRAM) has emerged as the most popular choice for CIM implementations. Its speed, robustness, and compatibility with existing fabrication processes make it ideal for AI accelerators. Researchers have developed modified SRAM bitcell structures, including 8T, 9T, and 10T configurations, along with auxiliary peripheral circuits to enhance performance.
The comprehensive nature of SRAM-based CIM development is illustrated in Figure 2. The figure shows how circuit-level innovations enable sophisticated computing functions and real-world AI applications. At the circuit level (Figure 2a), SRAM-based CIM requires specialized bitcell structures and peripheral circuits. These include analog-to-digital converters, time control systems, and redundant reference columns. These circuit innovations enable a range of functional capabilities (Figure 2b).

Digital operations include Boolean logic and content-addressable memory. Mixed-signal operations support multiply-accumulate and sum of absolute difference computations that are fundamental to neural networks.
As demonstrated in the application layer (Figure 2c), these technical capabilities translate into accelerated AI algorithms. These include convolutional neural networks for image classification, AES encryption for security applications, and k-nearest neighbor algorithms for pattern recognition. However, SRAM faces challenges, including low density and high leakage current, that limit its scalability for large AI processors.
Dynamic Random-Access Memory (DRAM), while less common for direct in-memory computation due to its refreshing requirements, plays a central role in Near-Memory Processing architectures. Technologies such as High-Bandwidth Memory and Hybrid Memory Cube utilize 3D stacking to reduce the physical distance between computation and memory.
Resistive Random-Access Memory (ReRAM) is the most promising new technology for CIM. This non-volatile memory has several advantages. It offers high density and works well with back-end fabrication processes. It is also very suitable for matrix-vector multiplication operations. These operations are fundamental to neural networks.
CIM implementations also vary in their computational domains. Analog CIM uses the physical properties of memory cells to perform operations. It works through current summation and charge collection. This offers higher weight density but can have noise issues. Digital CIM provides high accuracy with one device per bit. Mixed-signal approaches try to balance the benefits of both analog and digital methods.
Transformative benefits for AI applications
The practical benefits of CIM for AI are both measurable and compelling, as demonstrated in Figure 3. The energy efficiency comparison reveals the advantages of CIM architectures across different technology nodes. While traditional CPUs achieve only 0.01-0.1 TOPS/W (tera operations per second per watt), digital in-memory architectures deliver 1-100 TOPS/W, representing 100 to 1000 times better energy efficiency. Advanced CIM approaches like silicon photonics and optical systems push efficiency even higher.

The energy breakdown analysis (Figure 3, right) reveals why CIM is effective. Traditional CPUs are dominated by memory access energy (blue bars), while CIM architectures reduce this bottleneck by performing computation directly in memory. This fundamental advantage translates to measurable performance improvements across AI applications.
The real-world impact of CIM on transformer and LLM acceleration is demonstrated by recent implementations shown in Table 1. Various CIM architectures have achieved performance improvements with speedups ranging from 2.3x to 200x compared to NVIDIA GPUs. Energy efficiency gains reach up to 1894x. These results span multiple transformer models, including BERT, GPT, and RoBERTa, demonstrating CIM’s broad applicability to modern language models.

Summary
As we enter the post-Moore’s Law era, CIM represents a significant architectural shift that addresses key challenges in AI computing. The technology is advancing rapidly, with SRAM-based solutions approaching commercial viability and emerging non-volatile memory solutions showing potential for future applications. As AI continues to expand across technology applications, CIM could become an important enabling technology for more efficient AI deployment.
References
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference, arXiv
Energy-efficient computing-in-memory architecture for AI processor – device, circuit, architecture perspective, Science China
A review on SRAM-based computing in-memory: Circuits, functions, and applications, Researching
An Overview of Processing-in-Memory Circuits for Artificial Intelligence and Machine Learning, IEEE
Analog, In-memory Compute Architectures for Artificial Intelligence, ResearchGate
In-Memory Computing for Machine Learning and Deep Learning, IEEE
Emerging In-memory Computing for Neural Networks, Fraunhofer
Related EE World content
What is DDR (Double Data Rate) Memory and SDRAM Memory
Memory-centric computing and memory system architectures
What is DRAM (Dynamic Random Access Memory) vs SRAM?
Memory basics – volatile, non-volatile and persistent
Memory technology from Floating Gates to FRAM
Memory technologies and packaging options
Leave a Reply