Runtime MPI collective checking with tree-based overlay networks

Tobias Hilbrich, Fabian Hänsel, Martin Schulz, Bronis R. De Supinski, Matthias S. Müller, Wolfgang E. Nagel

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Runtime error detection tools detect many classes of MPI usage errors, including errors in collective communication calls. However, they often face scalability challenges. We present runtime checks for MPI collective operations that use a Tree-Based Overlay Network (TBON) for scalability and that provide full datatype matching. While we can use transitive correctness properties for most checks, some collective operations impose non-transitive correctness properties, e.g., MPI-Alltoallv, where we use an intralayer communication within the TBON to distribute datatype matching information. An overhead study with stress tests and two benchmark suites demonstrates applicability and scalability at 4,096, 2,048 and 16,384 processes respectively.

Original languageEnglish
Title of host publicationProceedings of the 20th European MPI Users' Group Meeting, EuroMPI 2013
PublisherAssociation for Computing Machinery
Pages129-134
Number of pages6
ISBN (Print)9788461651337
DOIs
StatePublished - 2013
Externally publishedYes
Event20th European MPI Users' Group Meeting, EuroMPI 2013 - Madrid, Spain
Duration: 15 Sep 201318 Sep 2013

Publication series

NameACM International Conference Proceeding Series

Conference

Conference20th European MPI Users' Group Meeting, EuroMPI 2013
Country/TerritorySpain
CityMadrid
Period15/09/1318/09/13

Keywords

  • Correctness
  • MPI collectives
  • Tree-based overlay networks

Fingerprint

Dive into the research topics of 'Runtime MPI collective checking with tree-based overlay networks'. Together they form a unique fingerprint.

Cite this