Stack trace analysis for large scale debugging

Dorian C. Arnold, Dong H. Ahn, Bronis R. De Supinski, Gregory L. Lee, Barton P. Miller, Martin Schulz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

121 Scopus citations

Abstract

We present the Stack Trace Analysis Tool (STAT) to aid in debugging extreme-scale applications. STAT can reduce problem exploration spaces from thousands of processes to a few by sampling stack traces to form process equivalence classes, groups of processes exhibiting similar behavior. We can then use full-featured debuggers on representatives from these behavior classes for root cause analysis. STAT scalably collects stack traces over a sampling period to assemble a profile of the application's behavior. STAT routines process the samples to form a call graph prefix tree that encodes common behavior classes over the program's process space and time. STAT leverages MRNet, an infrastructure for tool control and data analyses, to over-come scalability barriers faced by heavy-weight debuggers. We present STAT's design and an evaluation that shows STAT gathers informative process traces from thousands of processes with sub-second latencies, a significant improvement over existing tools. Our case studies of production codes verify that STAT supports the quick identification of errors that were previously difficult to locate.

Original languageEnglish
Title of host publicationProceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM
DOIs
StatePublished - 2007
Externally publishedYes
Event21st International Parallel and Distributed Processing Symposium, IPDPS 2007 - Long Beach, CA, United States
Duration: 26 Mar 200730 Mar 2007

Publication series

NameProceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM

Conference

Conference21st International Parallel and Distributed Processing Symposium, IPDPS 2007
Country/TerritoryUnited States
CityLong Beach, CA
Period26/03/0730/03/07

Fingerprint

Dive into the research topics of 'Stack trace analysis for large scale debugging'. Together they form a unique fingerprint.

Cite this