LSTM Hardware Inference Accelerator for LiteRT

G. Mannes, E. Manor, S. Greenberg

Research output: Contribution to journalArticlepeer-review

Abstract

The efficient deployment of Recurrent Neural Networks (RNNs), particularly long short-term memory (LSTM) architectures, on edge devices has become increasingly important due to their ability to model nonlinear time-variant dynamics. However, the computational demands of LSTM inference often exceed the capabilities of resource-constrained microcontroller-based IoT devices. Efficient mapping of computational load onto hardware and software resources is a key challenge for improving performance while maintaining low power and a small area footprint. This paper presents a hardware-software framework that accelerates LSTM inference on edge devices by combining a modified LiteRT (formerly TensorFlow Lite) model running on a microcontroller (MCU) with a dedicated LSTM engine in a Neural Processing Unit (NPU) accelerator. To evaluate trade-offs between accuracy, latency, and energy efficiency, we introduce an LSTM benchmark suite for ultra-low-power tiny ML systems. Using this framework, experiments on various LiteRT-based LSTM architectures demonstrate up to 300x speedup compared to software-only implementations. For instance, the runtime for the HAR classification task is reduced from 1.8 seconds to just 6 milliseconds.

Original languageEnglish
JournalIEEE Transactions on Circuits and Systems I: Regular Papers
DOIs
StateAccepted/In press - 1 Jan 2025

Keywords

  • LSTM
  • LiteRT
  • TensorFlow-lite for microcontrollers
  • TinyML
  • hardware-software codesign
  • neural processing unit

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'LSTM Hardware Inference Accelerator for LiteRT'. Together they form a unique fingerprint.

Cite this