TY - JOUR
T1 - LSTM Hardware Inference Accelerator for LiteRT
AU - Mannes, G.
AU - Manor, E.
AU - Greenberg, S.
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - The efficient deployment of Recurrent Neural Networks (RNNs), particularly long short-term memory (LSTM) architectures, on edge devices has become increasingly important due to their ability to model nonlinear time-variant dynamics. However, the computational demands of LSTM inference often exceed the capabilities of resource-constrained microcontroller-based IoT devices. Efficient mapping of computational load onto hardware and software resources is a key challenge for improving performance while maintaining low power and a small area footprint. This paper presents a hardware-software framework that accelerates LSTM inference on edge devices by combining a modified LiteRT (formerly TensorFlow Lite) model running on a microcontroller (MCU) with a dedicated LSTM engine in a Neural Processing Unit (NPU) accelerator. To evaluate trade-offs between accuracy, latency, and energy efficiency, we introduce an LSTM benchmark suite for ultra-low-power tiny ML systems. Using this framework, experiments on various LiteRT-based LSTM architectures demonstrate up to 300x speedup compared to software-only implementations. For instance, the runtime for the HAR classification task is reduced from 1.8 seconds to just 6 milliseconds.
AB - The efficient deployment of Recurrent Neural Networks (RNNs), particularly long short-term memory (LSTM) architectures, on edge devices has become increasingly important due to their ability to model nonlinear time-variant dynamics. However, the computational demands of LSTM inference often exceed the capabilities of resource-constrained microcontroller-based IoT devices. Efficient mapping of computational load onto hardware and software resources is a key challenge for improving performance while maintaining low power and a small area footprint. This paper presents a hardware-software framework that accelerates LSTM inference on edge devices by combining a modified LiteRT (formerly TensorFlow Lite) model running on a microcontroller (MCU) with a dedicated LSTM engine in a Neural Processing Unit (NPU) accelerator. To evaluate trade-offs between accuracy, latency, and energy efficiency, we introduce an LSTM benchmark suite for ultra-low-power tiny ML systems. Using this framework, experiments on various LiteRT-based LSTM architectures demonstrate up to 300x speedup compared to software-only implementations. For instance, the runtime for the HAR classification task is reduced from 1.8 seconds to just 6 milliseconds.
KW - LSTM
KW - LiteRT
KW - TensorFlow-lite for microcontrollers
KW - TinyML
KW - hardware-software codesign
KW - neural processing unit
UR - https://www.scopus.com/pages/publications/105017062353
U2 - 10.1109/TCSI.2025.3609633
DO - 10.1109/TCSI.2025.3609633
M3 - Article
AN - SCOPUS:105017062353
SN - 1549-8328
JO - IEEE Transactions on Circuits and Systems I: Regular Papers
JF - IEEE Transactions on Circuits and Systems I: Regular Papers
ER -