TY - GEN
T1 - SHARQ
T2 - 19th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, SAMOS 2019
AU - Rheindt, Sven
AU - Maier, Sebastian
AU - Schmaus, Florian
AU - Wild, Thomas
AU - Schröder-Preikschat, Wolfgang
AU - Herkersdorf, Andreas
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2019.
PY - 2019
Y1 - 2019
N2 - The recent trend towards tile-based manycore architectures has helped to tackle the memory wall by physically distributing memories and processing nodes. Distributed operating systems and applications allow to exploit the increased scalability of such architectures, but still face the data-to-task locality challenge. As inter-tile communication, thread synchronization and data transport often impose significant software overhead on such architectures, many applications would benefit from a more efficient and powerful communication primitive with minimal software involvement. We propose software-defined hardware-managed queues for distributed computing architectures that enable efficient inter-tile communication by leveraging application-specific queues with arbitrarily sized elements. To ensure (remote) processing of queued elements, SHARQ introduces the concept of an optional handler task, which is scheduled by hardware on demand. Queue and memory management, intra- and inter-tile data transfer, and handler task invocation are entirely handled by hardware. Only the dynamic queue creation at runtime is performed in software. As an example use-case, we integrated SHARQ into the MPI library. The evaluation with the MPI-based NAS benchmarks shows a reduction in execution time by up to 48% for the communication intense IS kernel in a 4 x 4 tile design on an FPGA platform with a total of 80 LEON3 cores.
AB - The recent trend towards tile-based manycore architectures has helped to tackle the memory wall by physically distributing memories and processing nodes. Distributed operating systems and applications allow to exploit the increased scalability of such architectures, but still face the data-to-task locality challenge. As inter-tile communication, thread synchronization and data transport often impose significant software overhead on such architectures, many applications would benefit from a more efficient and powerful communication primitive with minimal software involvement. We propose software-defined hardware-managed queues for distributed computing architectures that enable efficient inter-tile communication by leveraging application-specific queues with arbitrarily sized elements. To ensure (remote) processing of queued elements, SHARQ introduces the concept of an optional handler task, which is scheduled by hardware on demand. Queue and memory management, intra- and inter-tile data transfer, and handler task invocation are entirely handled by hardware. Only the dynamic queue creation at runtime is performed in software. As an example use-case, we integrated SHARQ into the MPI library. The evaluation with the MPI-based NAS benchmarks shows a reduction in execution time by up to 48% for the communication intense IS kernel in a 4 x 4 tile design on an FPGA platform with a total of 80 LEON3 cores.
KW - Distributed architectures
KW - Hardware accelerator
KW - Hardware/software codesign
KW - Inter-tile communication
KW - MPMC queue
KW - NoC
UR - http://www.scopus.com/inward/record.url?scp=85069213748&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-27562-4_15
DO - 10.1007/978-3-030-27562-4_15
M3 - Conference contribution
AN - SCOPUS:85069213748
SN - 9783030275617
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 212
EP - 225
BT - Embedded Computer Systems
A2 - Pnevmatikatos, Dionisios N.
A2 - Pelcat, Maxime
A2 - Jung, Matthias
PB - Springer Verlag
Y2 - 7 July 2019 through 11 July 2019
ER -