TY - GEN
T1 - WinoTrain
T2 - 60th ACM/IEEE Design Automation Conference, DAC 2023
AU - Mori, Pierpaolo
AU - Sampath, Shambhavi Balamuthu
AU - Frickenstein, Lukas
AU - Vemparala, Manoj Rohit
AU - Fasfous, Nael
AU - Frickenstein, Alexander
AU - Stechele, Walter
AU - Passerone, Claudio
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Efficient inference is critical in realizing a low-power, real-time implementation of convolutional neural networks (CNNs) on compute and memory-constrained embedded platforms. Using quantization techniques and fast convolutional algorithms like Winograd, CNN inference can achieve benefits in latency and in energy consumption. Performing Winograd convolution involves (1) transforming the weights and activations to the Winograd domain, (2) performing element-wise multiplication on the transformed tensors, and (3) transforming the results back to the conventional spatial domain. Combining Winograd with quantization of all its steps results in severe accuracy degradation due to numerical instability. In this paper we propose a simple quantization-aware training technique, which quantizes all three steps of the Winograd convolution, while using a minimal number of scaling factors. Additionally, we propose an FPGA accelerator employing tiling and unrolling methods to highlight the performance benefits of using the full 8-bit quantized Winograd algorithm. We achieve 2× reduction in inference time compared to standard convolution on ResNet-18 for the ImageNet dataset, while improving the Top-1 accuracy by 55.7 p.p. compared to a standard post-training quantized Winograd variant of the network.
AB - Efficient inference is critical in realizing a low-power, real-time implementation of convolutional neural networks (CNNs) on compute and memory-constrained embedded platforms. Using quantization techniques and fast convolutional algorithms like Winograd, CNN inference can achieve benefits in latency and in energy consumption. Performing Winograd convolution involves (1) transforming the weights and activations to the Winograd domain, (2) performing element-wise multiplication on the transformed tensors, and (3) transforming the results back to the conventional spatial domain. Combining Winograd with quantization of all its steps results in severe accuracy degradation due to numerical instability. In this paper we propose a simple quantization-aware training technique, which quantizes all three steps of the Winograd convolution, while using a minimal number of scaling factors. Additionally, we propose an FPGA accelerator employing tiling and unrolling methods to highlight the performance benefits of using the full 8-bit quantized Winograd algorithm. We achieve 2× reduction in inference time compared to standard convolution on ResNet-18 for the ImageNet dataset, while improving the Top-1 accuracy by 55.7 p.p. compared to a standard post-training quantized Winograd variant of the network.
UR - http://www.scopus.com/inward/record.url?scp=85173100271&partnerID=8YFLogxK
U2 - 10.1109/DAC56929.2023.10247805
DO - 10.1109/DAC56929.2023.10247805
M3 - Conference contribution
AN - SCOPUS:85173100271
T3 - Proceedings - Design Automation Conference
BT - 2023 60th ACM/IEEE Design Automation Conference, DAC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 July 2023 through 13 July 2023
ER -