• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Microcontroller Tips

Microcontroller engineering resources, new microcontroller products and electronics engineering news

  • Products
    • 8-bit
    • 16-bit
    • 32-bit
    • 64-bit
  • Applications
    • 5G
    • Automotive
    • Connectivity
    • Consumer Electronics
    • EV Engineering
    • Industrial
    • IoT
    • Medical
    • Security
    • Telecommunications
    • Wearables
    • Wireless
  • Learn
    • eBooks / Tech Tips
    • EE Training Days
    • FAQs
    • Learning Center
    • Tech Toolboxes
    • Webinars/Digital Events
  • Resources
    • Design Guide Library
    • DesignFast
    • LEAP Awards
    • Podcasts
    • White Papers
  • Videos
    • EE Videos & Interviews
    • Teardown Videos
  • EE Forums
    • EDABoard.com
    • Electro-Tech-Online.com
  • Engineering Training Days
  • Advertise
  • Subscribe

What’s a Neural microcontroller?

May 30, 2022 By Jeff Shepard

The ability to run neural networks (NNs) on MCUs is becoming increasingly important to support artificial intelligence (AI) and machine learning (ML) in the Internet of Things (IoT) nodes and other embedded edge applications. Unfortunately, running NNs on MCUs is challenging due to the relatively small memory capacities of most MCUs.

This article details the memory challenges of running NNs on MCUs and looks at possible system-level solutions. It presents recently announced MCUs with embedded NN accelerators and closes by looking at how the Glow machine learning compiler for NNs can help reduce memory requirements.

Running NNs on MCUs (sometimes called tinyML) offers advantages over sending raw data to the cloud for analysis and action. Those advantages include the ability to tolerate poor or even no network connectivity and safeguard data privacy and security. MCU memory capacities are often limited to the main memory of hundreds of KB of SRAM, often less, and byte-addressable Flash of no more than a few MBs for read-only data.

To achieve high accuracy, most NNs require larger memory capacities. The memory needed by a NN includes read-only parameters and so-called feature maps that contain intermediate and final results. It can be tempting to process an NN layer on an MCU in the embedded memory before loading the next layer, but it’s often impractical. A single NN layer’s parameters and feature maps can require up to 100 MB of storage, exceeding the MCU memory size by as much as two orders of magnitude. Recently developed NNs with higher accuracies require even more memory, resulting in a widening gap between the available memory on most MCUs and the memory requirements of NNs (Figure 1).

Figure 1: The available memory on most MCUs is much too small to support the needs of the majority of NNs. (Image: Arxiv)

One solution to address MCU memory limitations is to dynamically swap NN data blocks between the MCU SRAM and a larger external (out-of-core) cash memory. Out-of-core NN implementations can suffer from several limitations, including: execution slowdown, storage wear out, higher energy consumption, and data security. If these concerns can be adequately addressed in a specific application, an MCU can be used to run large NNs with full accuracy and generality.

One approach to out-of-core NN implementation is to split one NN layer into a series of tiles small enough to fit into the MCU memory. This approach has been successfully applied to NN systems on servers where the NN tiles are swapped between the CPU/GPU memory and the server’s memory. Most embedded systems don’t have access to the large memory spaces available on servers. Using memory swapping approaches with MCUs can run into problems using a relatively small external SRAM or an SD card, such as lower SD card durability and reliability, slower execution due to I/O operations, higher energy consumption, and safety and security of out-of-core NN data storage.

Another approach to overcoming MCU memory limitations is optimizing the NN more completely using techniques such as model compression, parameter quantization, and designing tiny NNs from scratch. These approaches involve varying tradeoffs between model accuracy and generality, or both. In most cases, the techniques used to fit an NN into the memory space of an MCU result in the NN becoming too inaccurate (< 60% accuracy) or too specialized and not generalized enough (the NN can only detect a few object classes). These challenges can disqualify the use of MCUs where NNs with high accuracy and generality are needed, even if inference delays can be tolerated, such as:

  • NN inference on slowly changing signals such as monitoring crop health by analyzing hourly photos or traffic patterns by analyzing video frames taken every 20-30 minutes
  • Profiling NNs on the device by occasionally running a full-blown NN to estimate the accuracy of long-running smaller NNs
  • Transfer learning includes retraining NNs on MCUs with data collected from deployment every hour or day

NN accelerators embedded in MCUs

Many of the challenges of implementing NNs on MCU are being addressed by MCUs with embedded NN accelerators. These advanced MCUs are an emerging device category that promises to provide designers with new opportunities to develop IoT node and edge ML solutions. For example, an MCU with a hardware-based embedded convolutional neural network (CNN) accelerator enables battery-powered applications to execute AI inferences while spending only microjoules of energy (Figure 2).

Figure 2: Neural MCU block diagram showing the basic MCU blocks (upper left) and the CNN accelerator section (right). (Image: Maxim)

The MCU with an embedded CNN accelerator is a system on chip combining an Arm Cortex-M4 with a RISC-V core that can execute application and control codes as well as drive the CNN accelerator. The CNN engine has a weight storage memory of 442KB and can support 1-, 2-, 4-, and 8-bit weights (supporting networks of up to 3.5 million weights). On the fly, AI network updates are supported by the SRAM-based CNN weight memory structure. The architecture is flexible and allows CNNs to be trained using conventional toolsets such as PyTorch and TensorFlow.

Another MCU supplier has pre-announced developing a neural processing unit integrated with an ARM Cortex core. The new neural MCU is scheduled to ship later this year and will provide the same level of AI performance as a quad-core processor with an AI accelerator but at one-tenth the cost and one-twelfth the power consumption. Additional neural MCUs are expected to emerge in the near future.

Glow for smaller NN memories

Glow (graph lowering) is a machine learning compiler for neural network graphs. It’s available on Github and is designed to optimize the neural network graphs and generate code for various hardware devices. Two versions of Glow are available, one for Ahead of Time (AOT) and one for Just in Time (JIT) compilations. As the names suggest, AOT compilation is performed offline (ahead of time) and generates an object file (bundle) which is later linked with the application code, while JIT compilation is performed at runtime just before the model is executed.

MCUs are available that support AOT compilation using Glow. The compiler converts the neural networks into object files, which the user converts into a binary image for increased performance and a smaller memory footprint than a JIT (runtime) inference engine. In this case, Glow is used as a software back-end for the PyTorch machine learning framework and the ONNX model format (Figure 3).

Figure 3: Example of an AOT compilation flow diagram using Glow. (Image: NXP)

The Glow NN complier lowers a NN into a two-phase, strongly-typed intermediate representation. Domain-specific optimizations are performed in the first phase, while the second phase performs optimizations focused on specialized back-end hardware features. NNs on MCUs are available that combine support for Arm Cortex-M cores and Cadence Tensilica HiFi 4 DSP support, accelerating performance by utilizing Arm CMSIS-NN and HiFi NN libraries, respectively. Its features include:

  • Lower latency and smaller solution size for edge inference NNs.
  • Accelerate NN applications with CMSIS-NN and Cadence HiFi NN Library
  • Speed time to market using the available software development kit
  • Flexible implementation since Glow is open source with Apache License 2.0

Summary

Running NNs on MCUs is important for IoT nodes and other embedded edge applications, but it can be challenging due to MCU memory limitations. Several approaches have been developed to address memory limitations, including out-of-core designs that swap blocks of NN data between the MCU memory and an external memory and various NN software ‘optimization’ techniques. Unfortunately, these approaches involve tradeoffs between model accuracy and generality, which result in the NN becoming too inaccurate and/or too specialized to be of use in practical applications. The emergence of MCUs with integrated NN accelerators is beginning to address those concerns and enables the development of practical NN implementations for IoT and edge applications. Finally, the availability of the Glow NN compiler gives designers an additional tool for optimizing NN for smaller applications.

References

Artificial Intelligence Microcontroller with Ultra-Low-Power Convolutional Neural Network Accelerator, Maxim Integrated
eIQ® Inference with Glow NN, NXP
Enabling Large Neural Networks on Tiny Microcontrollers with Swapping, Arxiv
Glow, Meta AI

You may also like:


  • How does a recurrent neural network (RNN) remember?

  • What’s the difference between GPUs and TPUs for AI processing?

  • What are the top five neural network architectures?

  • What’s special about embedded microcontrollers?
  • neural networks
    Microcontrollers with neural networks: what are they?

Filed Under: Accelerators, Applications, Embedded, FAQ, Featured, microcontroller, Neural Networking, Products Tagged With: FAQ

Primary Sidebar

Featured Contributions

Five challenges for developing next-generation ADAS and autonomous vehicles

Securing IoT devices against quantum computing risks

RISC-V implementation strategies for certification of safety-critical systems

What’s new with Matter: how Matter 1.4 is reshaping interoperability and energy management

Edge AI: Revolutionizing real-time data processing and automation

More Featured Contributions

EE TECH TOOLBOX

“ee
Tech Toolbox: Internet of Things
Explore practical strategies for minimizing attack surfaces, managing memory efficiently, and securing firmware. Download now to ensure your IoT implementations remain secure, efficient, and future-ready.

EE Learning Center

EE Learning Center

EE ENGINEERING TRAINING DAYS

engineering
“bills
“microcontroller
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, tools and strategies for EE professionals.

RSS Current EDABoard.com discussions

  • Editing posts
  • avoiding mixer compression when acting as a phase detector
  • Crude Powerline FSK comms literally shorts the power bus at a certain frequency?
  • ISL8117 buck converter blowing up
  • audio transformers impedance

RSS Current Electro-Tech-Online.com Discussions

  • RS485 bus: common ground wire needed or not?
  • Kawai KDP 80 Electronic Piano Dead
  • Good Eats
  • What part is this marked .AC ?
  • Photo interrupter Connections

DesignFast

Design Fast Logo
Component Selection Made Simple.

Try it Today
design fast globle

Footer

Microcontroller Tips

EE World Online Network

  • 5G Technology World
  • EE World Online
  • Engineers Garage
  • Analog IC Tips
  • Battery Power Tips
  • Connector Tips
  • DesignFast
  • EDA Board Forums
  • Electro Tech Online Forums
  • EV Engineering
  • Power Electronic Tips
  • Sensor Tips
  • Test and Measurement Tips

Microcontroller Tips

  • Subscribe to our newsletter
  • Advertise with us
  • Contact us
  • About us

Copyright © 2025 · WTWH Media LLC and its licensors. All rights reserved.
The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media.

Privacy Policy