TY - GEN
T1 - Temperature-Aware Memory Mapping and Active Cooling of Neural Processing Units
AU - Moghaddas, Vahidreza
AU - Kattan, Hammam
AU - Bucher, Tim
AU - Yayla, Mikail
AU - Chen, Jian Jia
AU - Amrouch, Hussam
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Neural processing units (NPUs) have become indispensable for meeting the high computational demands of deep neural networks (DNNs). They provide a very efficient solution, thanks to having a huge MAC array that enables massive parallelism. Nevertheless, such an architecture exhibits excessive on-chip power densities leading to a localized hot-spot that seriously heats its surroundings. This work demonstrates how the on-chip temperatures induced by the MAC array create a spatial thermal gradient through the on-chip SRAM memory. This makes the memory regions sensitive to different error probabilities (Perror), leading to significant accuracy drops when DNNs are being executed. To surmount this challenge, we employ on-chip superlattice thermoelectric (TEC) cooling devices that effectively reduce the memory temperature. Although scaling the memory voltage makes SRAM cells more sensitive to errors, it significantly decreases the leakage power, which compensates for the power consumed by the incorporated TEC devices. Furthermore, operating the SRAM at a lower voltage and temperature substantially increases its lifetime because voltage and temperature are key stimuli of transistor aging. By running multi-physics simulations using commercial finite-element tools and SPICE simulations for the 14nm FinFET technology, we accurately derive the relation between the Perror in different memory regions and the corresponding cooling cost. We then propose a three-stage temperature-aware layer-wise memory mapping that exploits different degrees of the sensitivity of NN layers to errors towards maximizing the DNN accuracy while minimizing the cooling cost. Experimental results reveal that our method notably improves the DNN accuracy compared to existing temperature-oblivious memory mapping.
AB - Neural processing units (NPUs) have become indispensable for meeting the high computational demands of deep neural networks (DNNs). They provide a very efficient solution, thanks to having a huge MAC array that enables massive parallelism. Nevertheless, such an architecture exhibits excessive on-chip power densities leading to a localized hot-spot that seriously heats its surroundings. This work demonstrates how the on-chip temperatures induced by the MAC array create a spatial thermal gradient through the on-chip SRAM memory. This makes the memory regions sensitive to different error probabilities (Perror), leading to significant accuracy drops when DNNs are being executed. To surmount this challenge, we employ on-chip superlattice thermoelectric (TEC) cooling devices that effectively reduce the memory temperature. Although scaling the memory voltage makes SRAM cells more sensitive to errors, it significantly decreases the leakage power, which compensates for the power consumed by the incorporated TEC devices. Furthermore, operating the SRAM at a lower voltage and temperature substantially increases its lifetime because voltage and temperature are key stimuli of transistor aging. By running multi-physics simulations using commercial finite-element tools and SPICE simulations for the 14nm FinFET technology, we accurately derive the relation between the Perror in different memory regions and the corresponding cooling cost. We then propose a three-stage temperature-aware layer-wise memory mapping that exploits different degrees of the sensitivity of NN layers to errors towards maximizing the DNN accuracy while minimizing the cooling cost. Experimental results reveal that our method notably improves the DNN accuracy compared to existing temperature-oblivious memory mapping.
KW - Neural processing unit (NPU)
KW - On-chip memory
KW - Thermal management
KW - Thermoelectric cooling (TEC)
UR - http://www.scopus.com/inward/record.url?scp=85173088374&partnerID=8YFLogxK
U2 - 10.1109/ISLPED58423.2023.10244458
DO - 10.1109/ISLPED58423.2023.10244458
M3 - Conference contribution
AN - SCOPUS:85173088374
T3 - Proceedings of the International Symposium on Low Power Electronics and Design
BT - 2023 IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2023
Y2 - 7 August 2023 through 8 August 2023
ER -