TY - GEN
T1 - Infrastructure and API extensions for elastic execution of MPI applications
AU - Comprés, Isaías
AU - Mo-Hellenbrand, Ao
AU - Gerndt, Michael
AU - Bungartz, Hans Joachim
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/9/25
Y1 - 2016/9/25
N2 - Dynamic Processes support was added to MPI in version 2.0 of the standard. This feature of MPI has not been widely used by application developers in part due to the performance cost and limitations of the spawn operation. In this paper, we propose an extension to MPI that consists of four new operations. These operations allow an application to be initialized in an elastic mode of execution and enter an adaptation window when necessary, where resources are incorporated into or released from the application's world communicator. A prototype solution based on the MPICH library and the SLURM resource manager is presented and evaluated alongside an elastic scientific application that makes use of the new MPI extensions. The cost of these new operations is shown to be negligible due mainly to the latency hiding design, leaving the application's time for data redistribution as the only significant performance cost.
AB - Dynamic Processes support was added to MPI in version 2.0 of the standard. This feature of MPI has not been widely used by application developers in part due to the performance cost and limitations of the spawn operation. In this paper, we propose an extension to MPI that consists of four new operations. These operations allow an application to be initialized in an elastic mode of execution and enter an adaptation window when necessary, where resources are incorporated into or released from the application's world communicator. A prototype solution based on the MPICH library and the SLURM resource manager is presented and evaluated alongside an elastic scientific application that makes use of the new MPI extensions. The cost of these new operations is shown to be negligible due mainly to the latency hiding design, leaving the application's time for data redistribution as the only significant performance cost.
KW - Elastic computing
KW - MPI
KW - MPICH
KW - Malleable applications
KW - Message passing
KW - Resource aware computing
KW - SLURM
UR - http://www.scopus.com/inward/record.url?scp=84995611733&partnerID=8YFLogxK
U2 - 10.1145/2966884.2966917
DO - 10.1145/2966884.2966917
M3 - Conference contribution
AN - SCOPUS:84995611733
T3 - ACM International Conference Proceeding Series
SP - 82
EP - 97
BT - Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016
PB - Association for Computing Machinery
T2 - 23rd European MPI Users' Group Meeting, EuroMPI 2016
Y2 - 25 September 2016 through 28 September 2016
ER -