TY - GEN
T1 - Lightweight Instruction Set for Flexible Dilated Convolutions and Mixed-Precision Operands
AU - Friedrich, Simon
AU - Sampath, Shambhavi Balamuthu
AU - Wittig, Robert
AU - Vemparala, Manoj Rohit
AU - Fasfous, Nael
AU - Matus, Emil
AU - Stechele, Walter
AU - Fettweis, Gerhard
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Modern deep neural networks specialized for object detection and semantic segmentation require specific operations to increase or preserve the resolution of their feature maps. Hence, more generic convolution layers called transposed and dilated convolutions are employed, adding a large number of zeros between the elements of the input features or weights. Usually, standard neural network hardware accelerators process these convolutions in a straightforward manner, without paying attention to the added zeros, resulting in an increased computation time. To cope with this problem, recent works propose to skip the redundant elements with additional hardware or solve the problem efficiently only for a limited range of dilation rates. We present a general approach for accelerating transposed and dilated convolutions that does not introduce any hardware overhead while supporting all dilation rates. To achieve this, we introduce a novel precision-scalable lightweight instruction set and memory scheme that can be applied to the different convolution variants. This results in a speed-up of 5 times in DeepLabV3+ outperforming the recently proposed design methods. The support of precision-scalable execution of all workloads further increases the speedup in computation time shown for the PointPillars, DeepLabV3+, and ENet networks. Compared to the state-of-the-art commercial EdgeTPU, the instruction footprint of ResNet-50 of our designed accelerator is reduced by 60 percent.
AB - Modern deep neural networks specialized for object detection and semantic segmentation require specific operations to increase or preserve the resolution of their feature maps. Hence, more generic convolution layers called transposed and dilated convolutions are employed, adding a large number of zeros between the elements of the input features or weights. Usually, standard neural network hardware accelerators process these convolutions in a straightforward manner, without paying attention to the added zeros, resulting in an increased computation time. To cope with this problem, recent works propose to skip the redundant elements with additional hardware or solve the problem efficiently only for a limited range of dilation rates. We present a general approach for accelerating transposed and dilated convolutions that does not introduce any hardware overhead while supporting all dilation rates. To achieve this, we introduce a novel precision-scalable lightweight instruction set and memory scheme that can be applied to the different convolution variants. This results in a speed-up of 5 times in DeepLabV3+ outperforming the recently proposed design methods. The support of precision-scalable execution of all workloads further increases the speedup in computation time shown for the PointPillars, DeepLabV3+, and ENet networks. Compared to the state-of-the-art commercial EdgeTPU, the instruction footprint of ResNet-50 of our designed accelerator is reduced by 60 percent.
KW - DNN
KW - accelerator
KW - address generation
KW - dilated convolution
KW - instruction set
KW - memory alignment
KW - mixed-precision
KW - stride
KW - transposed convolution
UR - http://www.scopus.com/inward/record.url?scp=85161616984&partnerID=8YFLogxK
U2 - 10.1109/ISQED57927.2023.10129341
DO - 10.1109/ISQED57927.2023.10129341
M3 - Conference contribution
AN - SCOPUS:85161616984
T3 - Proceedings - International Symposium on Quality Electronic Design, ISQED
BT - Proceedings of the 24th International Symposium on Quality Electronic Design, ISQED 2023
PB - IEEE Computer Society
T2 - 24th International Symposium on Quality Electronic Design, ISQED 2023
Y2 - 5 April 2023 through 7 April 2023
ER -