TY - GEN
T1 - Accelerating and pruning CNNs for semantic segmentation on FPGA
AU - Morì, Pierpaolo
AU - Vemparala, Manoj Rohit
AU - Fasfous, Nael
AU - Mitra, Saptarshi
AU - Sarkar, Sreetama
AU - Frickenstein, Alexander
AU - Frickenstein, Lukas
AU - Helms, Domenik
AU - Nagaraja, Naveen Shankar
AU - Stechele, Walter
AU - Passerone, Claudio
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/7/10
Y1 - 2022/7/10
N2 - Semantic segmentation is one of the popular tasks in computer vision, providing pixel-wise annotations for scene understanding. However, segmentation-based convolutional neural networks require tremendous computational power. In this work, a fully-pipelined hardware accelerator with support for dilated convolution is introduced, which cuts down the redundant zero multiplications. Furthermore, we propose a genetic algorithm based automated channel pruning technique to jointly optimize computational complexity and model accuracy. Finally, hardware heuristics and an accurate model of the custom accelerator design enable a hardware-aware pruning framework. We achieve 2.44X lower latency with minimal degradation in semantic prediction quality (-1.98 pp lower mean intersection over union) compared to the baseline DeepLabV3+ model, evaluated on an Arria-10 FPGA. The binary files of the FPGA design, baseline and pruned models can be found in github.com/pierpaolomori/SemanticSegmentationFPGA
AB - Semantic segmentation is one of the popular tasks in computer vision, providing pixel-wise annotations for scene understanding. However, segmentation-based convolutional neural networks require tremendous computational power. In this work, a fully-pipelined hardware accelerator with support for dilated convolution is introduced, which cuts down the redundant zero multiplications. Furthermore, we propose a genetic algorithm based automated channel pruning technique to jointly optimize computational complexity and model accuracy. Finally, hardware heuristics and an accurate model of the custom accelerator design enable a hardware-aware pruning framework. We achieve 2.44X lower latency with minimal degradation in semantic prediction quality (-1.98 pp lower mean intersection over union) compared to the baseline DeepLabV3+ model, evaluated on an Arria-10 FPGA. The binary files of the FPGA design, baseline and pruned models can be found in github.com/pierpaolomori/SemanticSegmentationFPGA
UR - https://www.scopus.com/pages/publications/85137539053
U2 - 10.1145/3489517.3530424
DO - 10.1145/3489517.3530424
M3 - Conference contribution
AN - SCOPUS:85137539053
T3 - Proceedings - Design Automation Conference
SP - 145
EP - 150
BT - Proceedings of the 59th ACM/IEEE Design Automation Conference, DAC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 59th ACM/IEEE Design Automation Conference, DAC 2022
Y2 - 10 July 2022 through 14 July 2022
ER -