Clustering performance data efficiently at massive scales

Todd Gamblin, Bronis R. De Supinski, Martin Schulz, Rob Fowler, Daniel A. Reed

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

24 Zitate (Scopus)

Abstract

Existing supercomputers have hundreds of thousands of processor cores, and future systems may have hundreds of millions. Developers need detailed performance measurements to tune their applications and to exploit these systems fully. However, extreme scales pose unique challenges for performance-tuning tools, which can generate significant volumes of I/O. Compute-to-I/O ratios have increased drastically as systems have grown, and the I/O systems of large machines can handle the peak load from only a small fraction of cores. Tool developers need efficient techniques to analyze and to reduce performance data from large numbers of cores. We introduce CAPEK, a novel parallel clustering algorithm that enables in-situ analysis of performance data at run time. Our algorithm scales sub-linearly to 131,072 processes, running in less than one second even at that scale, which is fast enough for on-line use in production runs. The CAPEK implementation is fully generic and can be used for many types of analysis. We demonstrate its application to statistical trace sampling. Specifically, we use our algorithm to compute efficiently stratified sampling strategies for traces at run time. We show that such stratification can result in data-volume reduction of up to four orders of magnitude on current large-scale systems, with potential for greater reductions for future extreme-scale systems.

OriginalspracheEnglisch
TitelICS'10 - 2010 International Conference on Supercomputing
Seiten243-252
Seitenumfang10
DOIs
PublikationsstatusVeröffentlicht - 2010
Extern publiziertJa
Veranstaltung24th ACM International Conference on Supercomputing, ICS'10 - Tsukuba, Ibaraki, Japan
Dauer: 2 Juni 20104 Juni 2010

Publikationsreihe

NameProceedings of the International Conference on Supercomputing

Konferenz

Konferenz24th ACM International Conference on Supercomputing, ICS'10
Land/GebietJapan
OrtTsukuba, Ibaraki
Zeitraum2/06/104/06/10

Fingerprint

Untersuchen Sie die Forschungsthemen von „Clustering performance data efficiently at massive scales“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren