TY - GEN
T1 - MuRISCV-NN
T2 - 21st ACM International Conference on Computing Frontiers, CF 2024
AU - Van Kempen, Philipp
AU - Jones, Jefferson Parker
AU - Mueller-Gritschneder, Daniel
AU - Schlichtmann, Ulf
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/5/7
Y1 - 2024/5/7
N2 - With the rapid adoption of deep learning workloads to resource-constrained edge devices, efficient and data-parallel computing paradigms are becoming increasingly important. The RISC-V ISA provides a set of vector extensions featuring powerful data computation capabilities to accelerate deep learning workloads at the edge. However, the RISC-V ecosystem lacks a lightweight, open-source, and vendor-agnostic compute library to support these extensions on embedded platforms. After porting the existing ARM Cortex-M specific kernel implementation to the RISC-V vector ISA, we optimized the operator implementations to make the most out of the data-level parallelism provided by supported targets. In comparison to programs vectorized by LLVM's built-in auto-vectorizer, we see an up to 60% advantage in runtime for convolutional models and large vectors while introducing less ROM overheads. Furthermore, muRISCV-NN integrates well with existing ML deployment frameworks, is bit-accurate to CMSIS-NN, and can, thus, be used as a drop-in replacement with minimal changes to the compilation flow.
AB - With the rapid adoption of deep learning workloads to resource-constrained edge devices, efficient and data-parallel computing paradigms are becoming increasingly important. The RISC-V ISA provides a set of vector extensions featuring powerful data computation capabilities to accelerate deep learning workloads at the edge. However, the RISC-V ecosystem lacks a lightweight, open-source, and vendor-agnostic compute library to support these extensions on embedded platforms. After porting the existing ARM Cortex-M specific kernel implementation to the RISC-V vector ISA, we optimized the operator implementations to make the most out of the data-level parallelism provided by supported targets. In comparison to programs vectorized by LLVM's built-in auto-vectorizer, we see an up to 60% advantage in runtime for convolutional models and large vectors while introducing less ROM overheads. Furthermore, muRISCV-NN integrates well with existing ML deployment frameworks, is bit-accurate to CMSIS-NN, and can, thus, be used as a drop-in replacement with minimal changes to the compilation flow.
KW - Compilers
KW - Neural Network Inference
KW - RISC-V
KW - Vectorization
UR - http://www.scopus.com/inward/record.url?scp=85199145505&partnerID=8YFLogxK
U2 - 10.1145/3637543.3652878
DO - 10.1145/3637543.3652878
M3 - Conference contribution
AN - SCOPUS:85199145505
T3 - Proceedings of the 21st ACM International Conference on Computing Frontiers 2024 Workshops and Special Sessions, CF 2024 Companion
SP - 75
EP - 78
BT - Proceedings of the 21st ACM International Conference on Computing Frontiers 2024 Workshops and Special Sessions, CF 2024 Companion
PB - Association for Computing Machinery, Inc
Y2 - 7 May 2024 through 9 May 2024
ER -