ScalaTrace: Tracing, analysis and modeling of HPC codes at scale

Frank Mueller, Xing Wu, Martin Schulz, Bronis R. De Supinski, Todd Gamblin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and their long execution times. An alternative to running actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we contribute an approach that provides orders of magnitude smaller, if not near constant-size, communication traces regardless of the number of nodes while preserving structural information. We introduce intra- and inter-node compression techniques of MPI events, we develop a scheme to preserve time and causality of communication events, and we present results of our implementation for BlueGene/L. Given this novel capability, we discuss its impact on communication tuning and on trace extrapolation. To the best of our knowledge, such a concise representation of MPI traces in a scalable manner combined with time-preserving deterministic MPI call replay are without any precedence.

Original languageEnglish
Title of host publicationApplied Parallel and Scientific Computing - 10th International Conference, PARA 2010, Revised Selected Papers
Pages410-418
Number of pages9
EditionPART 2
DOIs
StatePublished - 2012
Externally publishedYes
Event10th International Conference on Applied Parallel and Scientific Computing, PARA 2010 - Reykjavik, Iceland
Duration: 6 Jun 20109 Jun 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume7134 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Conference on Applied Parallel and Scientific Computing, PARA 2010
Country/TerritoryIceland
CityReykjavik
Period6/06/109/06/10

Keywords

  • High-Performance Computing
  • Message Passing
  • Tracing

Fingerprint

Dive into the research topics of 'ScalaTrace: Tracing, analysis and modeling of HPC codes at scale'. Together they form a unique fingerprint.

Cite this