TY - JOUR
T1 - Thermal-Aware Design for Approximate DNN Accelerators
AU - Zervakis, Georgios
AU - Anagnostopoulos, Iraklis
AU - Alsalamin, Sami
AU - Spantidi, Ourania
AU - Roman-Ballesteros, Isai
AU - Henkel, Jorg
AU - Amrouch, Hussam
N1 - Publisher Copyright:
IEEE
PY - 2022
Y1 - 2022
N2 - Recent breakthrough in Neural Networks (NNs) has made DNN accelerators ubiquitous and led to an ever-increasing quest on adopting them from Cloud to edge computing. However, state-of-the-art DNN accelerators pack immense computational power in a relatively confined area, inducing significant on-chip power densities that lead to intolerable thermal bottlenecks. Existing state of the art focuses on using approximate multipliers only to trade-off efficiency with inference accuracy. In this work, we present a thermal-aware approximate DNN accelerator design in which we additionally trade-off approximation with temperature effects towards designing DNN accelerators that satisfy tight temperature constraints. Using commercial multi-physics tool flows for heat simulations, we demonstrate how our thermal-aware approximate design reduces the temperature from 139C, in an accurate circuit, down to 79C. This enables DNN accelerators to fulfill tight thermal constraints, while still maximizing the performance and reducing the energy by around 75% with a negligible accuracy loss of merely 0.44% on average for a wide range of NN models. Furthermore, using physics-based transistor aging models, we demonstrate how reductions in voltage and temperature obtained by our approximate design considerably improve the circuit's reliability. Our approximate design exhibits around 40% less aging-induced degradation compared to the baseline design.
AB - Recent breakthrough in Neural Networks (NNs) has made DNN accelerators ubiquitous and led to an ever-increasing quest on adopting them from Cloud to edge computing. However, state-of-the-art DNN accelerators pack immense computational power in a relatively confined area, inducing significant on-chip power densities that lead to intolerable thermal bottlenecks. Existing state of the art focuses on using approximate multipliers only to trade-off efficiency with inference accuracy. In this work, we present a thermal-aware approximate DNN accelerator design in which we additionally trade-off approximation with temperature effects towards designing DNN accelerators that satisfy tight temperature constraints. Using commercial multi-physics tool flows for heat simulations, we demonstrate how our thermal-aware approximate design reduces the temperature from 139C, in an accurate circuit, down to 79C. This enables DNN accelerators to fulfill tight thermal constraints, while still maximizing the performance and reducing the energy by around 75% with a negligible accuracy loss of merely 0.44% on average for a wide range of NN models. Furthermore, using physics-based transistor aging models, we demonstrate how reductions in voltage and temperature obtained by our approximate design considerably improve the circuit's reliability. Our approximate design exhibits around 40% less aging-induced degradation compared to the baseline design.
KW - Approximate Computing
KW - Artificial neural networks
KW - Deep Neural Networks
KW - Density measurement
KW - Internet
KW - Neural Processing Unit
KW - Power system measurements
KW - Reliability
KW - System-on-chip
KW - Systolic MAC Array
KW - Systolic arrays
KW - Temperature
KW - Thermal Design
KW - Throughput
KW - VLSI
UR - http://www.scopus.com/inward/record.url?scp=85122592719&partnerID=8YFLogxK
U2 - 10.1109/TC.2022.3141054
DO - 10.1109/TC.2022.3141054
M3 - Article
AN - SCOPUS:85122592719
SN - 0018-9340
JO - IEEE Transactions on Computers
JF - IEEE Transactions on Computers
ER -