TY - GEN
T1 - MATAR
T2 - 2024 Design, Automation and Test in Europe Conference and Exhibition, DATE 2024
AU - Mori, Pierpaolo
AU - Thoma, Moritz
AU - Frickenstein, Lukas
AU - Sampath, Shambhavi Balamuthu
AU - Fasfous, Nael
AU - Vemparala, Manoj Rohit
AU - Frickenstein, Alexander
AU - Stechele, Walter
AU - Mueller-Gritschneder, Daniel
AU - Passerone, Claudio
N1 - Publisher Copyright:
© 2024 EDAA.
PY - 2024
Y1 - 2024
N2 - Quantization of deep neural networks (DNNs) reduces their memory footprint and simplifies their hardware arithmetic logic, enabling efficient inference on edge devices. Different hardware targets can support different forms of quantization, e.g. full 8-bit, or 8/4/2-bit mixed-precision combinations, or fully-flexible bit-serial solutions. This makes standard quantization-aware training (QAT) of a DNN for different targets challenging, as there needs to be careful consideration of the supported quantization-levels of each target at training time. In this paper, we propose a generalized QAT solution that results in a DNN which can be retargeted to different hardware, without any retraining or prior knowledge of the hardware's supported quantization policy. First, we present the novel training scheme which makes the model aware of multiple quantization strategies. Then we demonstrate the retargeting capabilities of the resulting DNN by using a genetic algorithm to search for layer-wise, mixed-precision solutions that maximize performance and/or accuracy on the hardware target, without the need of fine-tuning. By making the DNN agnostic of the final hardware target, our method allows DNNs to be distributed to many users on different hardware platforms, without the need for sharing the training loop or dataset of the DNN developers, nor detailing the hardware capabilities ahead of time by the end-users of the efficient quantized solution. Models trained with our approach can generalize on multiple quantization policies with minimal accuracy degradation compared to target-specific quantization counterparts.
AB - Quantization of deep neural networks (DNNs) reduces their memory footprint and simplifies their hardware arithmetic logic, enabling efficient inference on edge devices. Different hardware targets can support different forms of quantization, e.g. full 8-bit, or 8/4/2-bit mixed-precision combinations, or fully-flexible bit-serial solutions. This makes standard quantization-aware training (QAT) of a DNN for different targets challenging, as there needs to be careful consideration of the supported quantization-levels of each target at training time. In this paper, we propose a generalized QAT solution that results in a DNN which can be retargeted to different hardware, without any retraining or prior knowledge of the hardware's supported quantization policy. First, we present the novel training scheme which makes the model aware of multiple quantization strategies. Then we demonstrate the retargeting capabilities of the resulting DNN by using a genetic algorithm to search for layer-wise, mixed-precision solutions that maximize performance and/or accuracy on the hardware target, without the need of fine-tuning. By making the DNN agnostic of the final hardware target, our method allows DNNs to be distributed to many users on different hardware platforms, without the need for sharing the training loop or dataset of the DNN developers, nor detailing the hardware capabilities ahead of time by the end-users of the efficient quantized solution. Models trained with our approach can generalize on multiple quantization policies with minimal accuracy degradation compared to target-specific quantization counterparts.
UR - http://www.scopus.com/inward/record.url?scp=85196482119&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85196482119
T3 - Proceedings -Design, Automation and Test in Europe, DATE
BT - 2024 Design, Automation and Test in Europe Conference and Exhibition, DATE 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 25 March 2024 through 27 March 2024
ER -