TY - GEN
T1 - Optimizing computation-communication overlap in asynchronous task-based programs
AU - Castillo, Emilio
AU - Jain, Nikhil
AU - Casas, Marc
AU - Moreto, Miquel
AU - Schulz, Martin
AU - Beivide, Ramon
AU - Valero, Mateo
AU - Bhatele, Abhinav
N1 - Publisher Copyright:
© 2019 Copyright held by the owner/author(s).
PY - 2019/2/16
Y1 - 2019/2/16
N2 - Asynchronous task-based programming models are gaining popularity to address programmability and performance challenges in high performance computing. One of the main attractions of these models and runtimes is their potential to automatically expose and exploit overlap of computation with communication. However, ineficient interactions between such programming models and the underlying messaging layer (in most cases, MPI) limit the achievable computation-communication overlap and negatively impact the performance of parallel programs. We propose to expose information about MPI internals to a task-based runtime system to make better scheduling decisions. In particular, we show how existing mechanisms used to profile MPI implementations can be used to share information between MPI and a task-based runtime. Further, an evaluation of the proposed method shows performance improvements of up to 30.7% for applications with collective communication.
AB - Asynchronous task-based programming models are gaining popularity to address programmability and performance challenges in high performance computing. One of the main attractions of these models and runtimes is their potential to automatically expose and exploit overlap of computation with communication. However, ineficient interactions between such programming models and the underlying messaging layer (in most cases, MPI) limit the achievable computation-communication overlap and negatively impact the performance of parallel programs. We propose to expose information about MPI internals to a task-based runtime system to make better scheduling decisions. In particular, we show how existing mechanisms used to profile MPI implementations can be used to share information between MPI and a task-based runtime. Further, an evaluation of the proposed method shows performance improvements of up to 30.7% for applications with collective communication.
UR - http://www.scopus.com/inward/record.url?scp=85064194321&partnerID=8YFLogxK
U2 - 10.1145/3293883.3295720
DO - 10.1145/3293883.3295720
M3 - Conference contribution
AN - SCOPUS:85064194321
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
SP - 415
EP - 416
BT - PPoPP 2019 - Proceedings of the 24th Principles and Practice of Parallel Programming
PB - Association for Computing Machinery
T2 - 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019
Y2 - 16 February 2019 through 20 February 2019
ER -