Scalable temporal order analysis for large scale debugging

Dong H. Ahn, Bronis R. De Supinski, Ignacio Laguna, Gregory L. Lee, Ben Liblit, Barton P. Miller, Martin Schulz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

38 Scopus citations

Abstract

We present a scalable temporal order analysis technique that supports debugging of large scale applications by classifying MPI tasks based on their logical program execution order. Our approach combines static analysis techniques with dynamic analysis to determine this temporal order scalably. It uses scalable stack trace analysis techniques to guide selection of critical program execution points in anomalous application runs. Our novel temporal ordering engine then leverages this information along with the application's static control structure to apply data flow analysis techniques to determine key application data such as loop control variables. We then use lightweight techniques to gather the dynamic data that determines the temporal order of the MPI tasks. Our evaluation, which extends the Stack Trace Analysis Tool (STAT), demonstrates that this temporal order analysis technique can isolate bugs in benchmark codes with injected faults as well as a real world hang case with AMG2006.

Original languageEnglish
Title of host publicationProceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09
DOIs
StatePublished - 2009
Externally publishedYes
EventConference on High Performance Computing Networking, Storage and Analysis, SC '09 - Portland, OR, United States
Duration: 14 Nov 200920 Nov 2009

Publication series

NameProceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09

Conference

ConferenceConference on High Performance Computing Networking, Storage and Analysis, SC '09
Country/TerritoryUnited States
CityPortland, OR
Period14/11/0920/11/09

Fingerprint

Dive into the research topics of 'Scalable temporal order analysis for large scale debugging'. Together they form a unique fingerprint.

Cite this