TY - GEN
T1 - Scalable compression and replay of communication traces in massively parallel environments
AU - Noeth, Michael
AU - Marathe, Jaydeep
AU - Mueller, Frank
AU - Schulz, Martin
AU - De Supinski, Bronis
PY - 2006
Y1 - 2006
N2 - Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code and system complexity as well as the time to execute such codes. An alternative to run actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we contribute an approach that provides near constant-size communication traces regardless of the number of nodes while preserving structural information. We introduce intra- and inter-node compression techniques of MPI events and present results of our implementation. Given this novel capability, we discuss its impact on communication tuning and beyond.
AB - Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code and system complexity as well as the time to execute such codes. An alternative to run actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we contribute an approach that provides near constant-size communication traces regardless of the number of nodes while preserving structural information. We introduce intra- and inter-node compression techniques of MPI events and present results of our implementation. Given this novel capability, we discuss its impact on communication tuning and beyond.
UR - http://www.scopus.com/inward/record.url?scp=34548234976&partnerID=8YFLogxK
U2 - 10.1145/1188455.1188605
DO - 10.1145/1188455.1188605
M3 - Conference contribution
AN - SCOPUS:34548234976
SN - 0769527000
SN - 9780769527000
T3 - Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC'06
BT - Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC'06
ER -