TY - JOUR
T1 - DySHARQ
T2 - Dynamic Software-Defined Hardware-Managed Queues for Tile-Based Architectures
AU - Rheindt, Sven
AU - Maier, Sebastian
AU - Pohle, Nora
AU - Nolte, Lars
AU - Lenke, Oliver
AU - Schmaus, Florian
AU - Wild, Thomas
AU - Schröder-Preikschat, Wolfgang
AU - Herkersdorf, Andreas
N1 - Publisher Copyright:
© 2020, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2021/8
Y1 - 2021/8
N2 - The recent trend towards tile-based manycore architectures has helped to tackle the memory wall by physically distributing memories and processing nodes. However, this introduced a data-to-task locality challenge and inter-tile communication thus often imposes significant software overhead. Thus, we proposed software-defined hardware-managed SHARQ queues that enable efficient inter-tile communication by leveraging user-defined queues with arbitrarily sized elements. To ensure (remote) processing of queued elements, SHARQ introduces an optional handler task, which is scheduled by hardware on demand. Queue management, intra- and inter-tile data transfer, and handler task invocation are entirely handled by hardware. Only rare tasks, like the dynamic queue creation at run-time, are performed in software. DySHARQ, an extension of SHARQ, enables dynamic and concurrent queue memory management and queue length adjustments to be able to adapt to application and resource requirement changes. The DySHARQ hardware is able to monitor the queue memory requirements at run-time and conditionally schedules a software-defined memory management task. It further optimizes the hardware-software interaction for local queue operations. We integrated DySHARQ into the MPI library used by the NAS benchmarks. The evaluation shows a reduction in execution time by up to 43% (compared to software) for the communication intense IS kernel in a 4 × 4 tile design on an FPGA platform with a total of 80 LEON3 cores. The dynamic memory management reduces the memory footprint by 3.75× in a 2 × 2 design.
AB - The recent trend towards tile-based manycore architectures has helped to tackle the memory wall by physically distributing memories and processing nodes. However, this introduced a data-to-task locality challenge and inter-tile communication thus often imposes significant software overhead. Thus, we proposed software-defined hardware-managed SHARQ queues that enable efficient inter-tile communication by leveraging user-defined queues with arbitrarily sized elements. To ensure (remote) processing of queued elements, SHARQ introduces an optional handler task, which is scheduled by hardware on demand. Queue management, intra- and inter-tile data transfer, and handler task invocation are entirely handled by hardware. Only rare tasks, like the dynamic queue creation at run-time, are performed in software. DySHARQ, an extension of SHARQ, enables dynamic and concurrent queue memory management and queue length adjustments to be able to adapt to application and resource requirement changes. The DySHARQ hardware is able to monitor the queue memory requirements at run-time and conditionally schedules a software-defined memory management task. It further optimizes the hardware-software interaction for local queue operations. We integrated DySHARQ into the MPI library used by the NAS benchmarks. The evaluation shows a reduction in execution time by up to 43% (compared to software) for the communication intense IS kernel in a 4 × 4 tile design on an FPGA platform with a total of 80 LEON3 cores. The dynamic memory management reduces the memory footprint by 3.75× in a 2 × 2 design.
KW - Data-to-task locality
KW - Distributed manycore architecture
KW - Hardware-accelerated queue
KW - Hardware-software co-design
KW - Inter-tile communication
UR - http://www.scopus.com/inward/record.url?scp=85096373038&partnerID=8YFLogxK
U2 - 10.1007/s10766-020-00687-7
DO - 10.1007/s10766-020-00687-7
M3 - Article
AN - SCOPUS:85096373038
SN - 0885-7458
VL - 49
SP - 506
EP - 540
JO - International Journal of Parallel Programming
JF - International Journal of Parallel Programming
IS - 4
ER -