Clock delta compression for scalable order-replay of non-deterministic parallel applications

Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Martin Schulz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

The ability to record and replay program execution helps significantly in debugging non-deterministic MPI applications by reproducing message-receive orders. However, the large amount of data that traditional record-and-reply techniques record precludes its practical applicability to massively parallel applications. In this paper, we propose a new compression algorithm, Clock Delta Compression (CDC), for scalable record and replay of non-deterministic MPI applications. CDC defines a reference order of message receives based on a totally ordered relation using Lamport clocks, and only records the differences between this reference logical-clock order and an observed order. Our evaluation shows that CDC significantly reduces the record data size. For example, when we apply CDC to Monte Carlo particle transport Benchmark (MCB), which represents common non-deterministic communication patterns, CDC reduces the record size by approximately two orders of magnitude compared to traditional techniques and incurs between 13.1% and 25.5% of runtime overhead.

Original languageEnglish
Title of host publicationProceedings of SC 2015
Subtitle of host publicationThe International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9781450337236
DOIs
StatePublished - 15 Nov 2015
Externally publishedYes
EventInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015 - Austin, United States
Duration: 15 Nov 201520 Nov 2015

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
Volume15-20-November-2015
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

ConferenceInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015
Country/TerritoryUnited States
CityAustin
Period15/11/1520/11/15

Keywords

  • compression
  • debugging tools
  • non-determinism

Fingerprint

Dive into the research topics of 'Clock delta compression for scalable order-replay of non-deterministic parallel applications'. Together they form a unique fingerprint.

Cite this