Scrubjay: Deriving Knowledge from the Disarray of HPC Performance Data

Alfredo Gimenez, Todd Gamblin, Abhinav Bhatele, Chad Wood, Kathleen Shoga, Aniruddha Marathe, Peer Timo Bremer, Bernd Hamann, Martin Schulz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Modern HPC centers comprise clusters, storage, networks, power and cooling infrastructure, and more. Analyzing the efficiency of these complex facilities is a daunting task. Increasingly, facilities deploy sensors and monitoring tools, but with millions of instrumented components, analyzing collected data manually is intractable. Data from an HPC center comprises different formats, granularities, and semantics, and handwritten scripts no longer suffice to transform the data into a digestible form. We present ScrubJay, an intuitive, scalable framework for automatic analysis of disparate HPC data. ScrubJay decouples the task of specifying data relationships from the task of analyzing data. Domain experts can store reusable transformations that describe relations between domains. ScrubJay also automates performance analysis. Analysts provide a query over logical domains of interest, and ScrubJay automatically derives needed steps to transform raw measurements. ScrubJay makes large-scale analysis tractable, reproducible, and provides insights into HPC facilities.

Original languageEnglish
Title of host publicationSC 2017 - International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9781450351140
DOIs
StatePublished - 2017
Event2017 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017 - Denver, United States
Duration: 12 Nov 201717 Nov 2017

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
Volume2017-November
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2017 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2017
Country/TerritoryUnited States
CityDenver
Period12/11/1717/11/17

Keywords

  • Facility Monitoring
  • HPC Performance Analysis
  • Performance Tools

Fingerprint

Dive into the research topics of 'Scrubjay: Deriving Knowledge from the Disarray of HPC Performance Data'. Together they form a unique fingerprint.

Cite this