TY - JOUR
T1 - HW-Flow-Fusion
T2 - Inter-Layer Scheduling for Convolutional Neural Network Accelerators with Dataflow Architectures
AU - Valpreda, Emanuele
AU - Morì, Pierpaolo
AU - Fasfous, Nael
AU - Vemparala, Manoj Rohit
AU - Frickenstein, Alexander
AU - Frickenstein, Lukas
AU - Stechele, Walter
AU - Passerone, Claudio
AU - Masera, Guido
AU - Martina, Maurizio
N1 - Publisher Copyright:
© 2022 by the authors.
PY - 2022/9
Y1 - 2022/9
N2 - Energy and throughput efficient acceleration of convolutional neural networks (CNN) on devices with a strict power budget is achieved by leveraging different scheduling techniques to minimize data movement and maximize data reuse. Several dataflow mapping frameworks have been developed to explore the optimal scheduling of CNN layers on reconfigurable accelerators. However, previous works usually optimize each layer singularly, without leveraging the data reuse between the layers of CNNs. In this work, we present an analytical model to achieve efficient data reuse by searching for efficient scheduling of communication and computation across layers. We call this inter-layer scheduling framework HW-Flow-Fusion, as we explore the fused map-space of multiple layers sharing the available resources of the same accelerator, investigating the constraints and trade-offs of mapping the execution of multiple workloads with data dependencies. We propose a memory-efficient data reuse model, tiling, and resource partitioning strategies to fuse multiple layers without recomputation. Compared to standard single-layer scheduling, inter-layer scheduling can reduce the communication volume by 51% and 53% for selected VGG16-E and ResNet18 layers on a spatial array accelerator, and reduce the latency by 39% and 34% respectively, while also increasing the computation to communication ratio which improves the memory bandwidth efficiency.
AB - Energy and throughput efficient acceleration of convolutional neural networks (CNN) on devices with a strict power budget is achieved by leveraging different scheduling techniques to minimize data movement and maximize data reuse. Several dataflow mapping frameworks have been developed to explore the optimal scheduling of CNN layers on reconfigurable accelerators. However, previous works usually optimize each layer singularly, without leveraging the data reuse between the layers of CNNs. In this work, we present an analytical model to achieve efficient data reuse by searching for efficient scheduling of communication and computation across layers. We call this inter-layer scheduling framework HW-Flow-Fusion, as we explore the fused map-space of multiple layers sharing the available resources of the same accelerator, investigating the constraints and trade-offs of mapping the execution of multiple workloads with data dependencies. We propose a memory-efficient data reuse model, tiling, and resource partitioning strategies to fuse multiple layers without recomputation. Compared to standard single-layer scheduling, inter-layer scheduling can reduce the communication volume by 51% and 53% for selected VGG16-E and ResNet18 layers on a spatial array accelerator, and reduce the latency by 39% and 34% respectively, while also increasing the computation to communication ratio which improves the memory bandwidth efficiency.
KW - DNN
KW - accelerator
KW - dataflow
KW - layer-fusion
KW - memory hierarchy
KW - scheduling
UR - http://www.scopus.com/inward/record.url?scp=85138648683&partnerID=8YFLogxK
U2 - 10.3390/electronics11182933
DO - 10.3390/electronics11182933
M3 - Article
AN - SCOPUS:85138648683
SN - 2079-9292
VL - 11
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 18
M1 - 2933
ER -