TY - GEN
T1 - MUST
T2 - 3rd International Workshop on Parallel Tools for High Performance Computing, HPC 2009
AU - Hilbrich, Tobias
AU - Schulz, Martin
AU - De Supinski, Bronis R.
AU - Müller, Matthias S.
PY - 2010
Y1 - 2010
N2 - The Message-Passing Interface (MPI) is large and complex. Therefore, programming MPI is error prone. Several MPI runtime correctness tools address classes of usage errors, such as deadlocks or non-portable constructs. To our knowledge none of these tools scales to more than about 100 processes. However, some of the current HPC systems use more than 100,000 cores and future systems are expected to use far more. Since errors often depend on the task count used, we need correctness tools that scale to the full system size.We present a novel framework for scalable MPI correctness tools to address this need. Our fine-grained, module-based approach supports rapid prototyping and allows correctness tools built upon it to adapt to different architectures and use cases. The design uses PnMPI to instantiate a tool from a set of individual modules.We present an overview of our design, along with first performance results for a proof of concept implementation.
AB - The Message-Passing Interface (MPI) is large and complex. Therefore, programming MPI is error prone. Several MPI runtime correctness tools address classes of usage errors, such as deadlocks or non-portable constructs. To our knowledge none of these tools scales to more than about 100 processes. However, some of the current HPC systems use more than 100,000 cores and future systems are expected to use far more. Since errors often depend on the task count used, we need correctness tools that scale to the full system size.We present a novel framework for scalable MPI correctness tools to address this need. Our fine-grained, module-based approach supports rapid prototyping and allows correctness tools built upon it to adapt to different architectures and use cases. The design uses PnMPI to instantiate a tool from a set of individual modules.We present an overview of our design, along with first performance results for a proof of concept implementation.
UR - http://www.scopus.com/inward/record.url?scp=84885227509&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-11261-4_5
DO - 10.1007/978-3-642-11261-4_5
M3 - Conference contribution
AN - SCOPUS:84885227509
SN - 9783642112607
T3 - Proceedings of the 3rd International Workshop on Parallel Tools for High Performance Computing 2009
SP - 53
EP - 66
BT - Proceedings of the 3rd International Workshop on Parallel Tools for High Performance Computing 2009
PB - Springer Verlag
Y2 - 14 September 2009 through 15 September 2009
ER -