TY - JOUR
T1 - PredCom
T2 - A Predictive Approach to Collecting Approximated Communication Traces
AU - Miwa, Shinobu
AU - Laguna, Ignacio
AU - Schulz, Martin
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - Communication traces collected from MPI applications are an important source of information for performance optimization as they can help analysts determine communication patterns and identify inefficiencies. However, their collection, especially at scale, is time consuming, since it usually requires running the complete target application on a large number of nodes. In this work, we present PredCom, a tool-chain to generate a predictive communication proxy based on information gathered from a few small scale runs, which allows us to extract approximate communication traces with an accuracy high enough for most analysis goals. For this, we combine LLVM passes on the original source code (to capture static program structure) with parameter prediction (to capture dynamic and scaling behavior). This approach drastically reduces the time needed for collecting the communication traces, even for traces on large numbers of MPI processes. We demonstrate that PredCom generates communication traces of various applications up to 1612x faster with an accuracy loss of 0.11 on average compared to the original large-scale traces, and we show that the generated traces can be used to optimize process placement.
AB - Communication traces collected from MPI applications are an important source of information for performance optimization as they can help analysts determine communication patterns and identify inefficiencies. However, their collection, especially at scale, is time consuming, since it usually requires running the complete target application on a large number of nodes. In this work, we present PredCom, a tool-chain to generate a predictive communication proxy based on information gathered from a few small scale runs, which allows us to extract approximate communication traces with an accuracy high enough for most analysis goals. For this, we combine LLVM passes on the original source code (to capture static program structure) with parameter prediction (to capture dynamic and scaling behavior). This approach drastically reduces the time needed for collecting the communication traces, even for traces on large numbers of MPI processes. We demonstrate that PredCom generates communication traces of various applications up to 1612x faster with an accuracy loss of 0.11 on average compared to the original large-scale traces, and we show that the generated traces can be used to optimize process placement.
KW - Communication traces
KW - LLVM
KW - MPI
UR - http://www.scopus.com/inward/record.url?scp=85089604113&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2020.3011121
DO - 10.1109/TPDS.2020.3011121
M3 - Article
AN - SCOPUS:85089604113
SN - 1045-9219
VL - 32
SP - 45
EP - 58
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 1
M1 - 9146385
ER -