TY - GEN
T1 - Parallel Golomb-Rice Decoder with 8-bit Unary Decoding for Weight Compression in TinyML Applications
AU - Vaddeboina, Mounika
AU - Kaja, Endri
AU - Yilmayer, Alper
AU - Prebeck, Sebastian
AU - Ecker, Wolfgang
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Due to the recent advances in AI, the requirement for Artificial Intelligence (AI) has increased exponentially in the domain of Internet of Things (IoT). Running Deep Neural Networks (DNNs) on edge devices gives the advantage of privacy, security, and lower latency. It is challenging to deploy them on embedded devices with constrained hardware resources since a lot of compute and memory resources are required. Memory access contributes to the majority of the energy requirements on edge devices. Although data compression plays a critical role in reducing storage and memory bandwidth requirements, most of the hardware decoders are inefficient in terms of power, area, and throughput. In this work, a hardware Parallel Golomb-Rice decoder is presented that can decode 8-bits of unary encoded data every cycle. The design has been integrated with a Neural Network (NN) accelerator and experimented with state-of-the-art benchmark models. Lossless compression is performed with an offline Golomb-Rice encoder. It encodes the weights of each layer with an optimum Golomb-Rice parameter. Applied to the benchmarks Anomaly Detection, Image Classification and Visual Wake Words the memory access during inference is reduced by 26.8%, 6.62% and 5.54% respectively. The decoder dissipates 0.4216 mW of power and delivers an average throughput of 860 MBps. The design has been synthesised with 40 nm technology and compared with state-of-the-art works.
AB - Due to the recent advances in AI, the requirement for Artificial Intelligence (AI) has increased exponentially in the domain of Internet of Things (IoT). Running Deep Neural Networks (DNNs) on edge devices gives the advantage of privacy, security, and lower latency. It is challenging to deploy them on embedded devices with constrained hardware resources since a lot of compute and memory resources are required. Memory access contributes to the majority of the energy requirements on edge devices. Although data compression plays a critical role in reducing storage and memory bandwidth requirements, most of the hardware decoders are inefficient in terms of power, area, and throughput. In this work, a hardware Parallel Golomb-Rice decoder is presented that can decode 8-bits of unary encoded data every cycle. The design has been integrated with a Neural Network (NN) accelerator and experimented with state-of-the-art benchmark models. Lossless compression is performed with an offline Golomb-Rice encoder. It encodes the weights of each layer with an optimum Golomb-Rice parameter. Applied to the benchmarks Anomaly Detection, Image Classification and Visual Wake Words the memory access during inference is reduced by 26.8%, 6.62% and 5.54% respectively. The decoder dissipates 0.4216 mW of power and delivers an average throughput of 860 MBps. The design has been synthesised with 40 nm technology and compared with state-of-the-art works.
KW - Deep Neural Networks
KW - Golomb-Rice coding
KW - Internet-of-Things
KW - Parallel decoders
UR - http://www.scopus.com/inward/record.url?scp=85189184316&partnerID=8YFLogxK
U2 - 10.1109/DSD60849.2023.00041
DO - 10.1109/DSD60849.2023.00041
M3 - Conference contribution
AN - SCOPUS:85189184316
T3 - Proceedings - 2023 26th Euromicro Conference on Digital System Design, DSD 2023
SP - 227
EP - 234
BT - Proceedings - 2023 26th Euromicro Conference on Digital System Design, DSD 2023
A2 - Niar, Smail
A2 - Ouarnoughi, Hamza
A2 - Skavhaug, Amund
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 26th Euromicro Conference on Digital System Design, DSD 2023
Y2 - 6 September 2023 through 8 September 2023
ER -