TY - GEN
T1 - MPI runtime error detection with MUST
T2 - 2012 24th International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012
AU - Hilbrich, Tobias
AU - Protze, Joachim
AU - Schulz, Martin
AU - De Supinski, Bronis R.
AU - Müller, Matthias S.
PY - 2012
Y1 - 2012
N2 - The widely used Message Passing Interface (MPI) is complex and rich. As a result, application developers require automated tools to avoid and to detect MPI programming errors. We present the Marmot Umpire Scalable Tool (MUST) that detects such errors with significantly increased scalability. We present improvements to our graph-based deadlock detection approach for MPI, which cover future MPI extensions. Our enhancements also check complex MPI constructs that no previous graph-based detection approach handled correctly. Finally, we present optimizations for the processing of MPI operations that reduce runtime deadlock detection overheads. Existing approaches often require O(p) analysis time per MPI operation, for p processes. We empirically observe that our improvements lead to sub-linear or better analysis time per operation for a wide range of real world applications.
AB - The widely used Message Passing Interface (MPI) is complex and rich. As a result, application developers require automated tools to avoid and to detect MPI programming errors. We present the Marmot Umpire Scalable Tool (MUST) that detects such errors with significantly increased scalability. We present improvements to our graph-based deadlock detection approach for MPI, which cover future MPI extensions. Our enhancements also check complex MPI constructs that no previous graph-based detection approach handled correctly. Finally, we present optimizations for the processing of MPI operations that reduce runtime deadlock detection overheads. Existing approaches often require O(p) analysis time per MPI operation, for p processes. We empirically observe that our improvements lead to sub-linear or better analysis time per operation for a wide range of real world applications.
UR - http://www.scopus.com/inward/record.url?scp=84877697102&partnerID=8YFLogxK
U2 - 10.1109/SC.2012.79
DO - 10.1109/SC.2012.79
M3 - Conference contribution
AN - SCOPUS:84877697102
SN - 9781467308069
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - 2012 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012
Y2 - 10 November 2012 through 16 November 2012
ER -