sys-sage: A Unified Representation of Dynamic Topologies & Attributes on HPC Systems

Stepan Vanecek, Martin Schulz

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

Abstract

HPC systems are getting ever more powerful, but this comes at the price of increasing system complexity: node architectures are deeply hierarchical and in many cases heterogeneous, and components can interact with each other in unpredictable ways. Further, current and future systems exhibit increasingly dynamic behavior, making static knowledge of their configuration alone insufficient. To use such systems efficiently, users as well as runtime systems have to be aware of the exact hardware structure at any time, i.e., the systems topology, its configuration parameters, and any side-effect a component can have on the rest of the system, and how this changes over time. Current approaches to providing such information usually focus on a single aspect and do not consider dynamic behavior. For example, the widely used hwloc library, the current de-facto standard solution for retrieving hardware topology information, provides a static hierarchical view of all node hardware, but neither covers other system configuration aspects nor dynamic behavior; other systems have similar limitations. In this paper, we propose sys-sage, a novel approach that overcomes these limitations and goes beyond the functionality of existing tools, including hwloc. It offers the ability to track dynamic changes, while unifying access to all system topology and configuration data. With that, it provides, at any point in time, a complete and updated view of the HPC system on which an application or runtime system is executing. The novelty of our approach lies in the ability to combine static hardware topology information with other relevant system data in a single API, while enabling a dynamic view and exposing system updates and reconfigurations on the fly. We show the design of sys-sage and demonstrate its applicability based on three separate use-cases, as well as by presenting further scenarios not easily solvable with currently available tools.

OriginalspracheEnglisch
TitelICS 2024 - Proceedings of the 38th ACM International Conference on Supercomputing
Herausgeber (Verlag)Association for Computing Machinery
Seiten363-375
Seitenumfang13
ISBN (elektronisch)9798400706103
DOIs
PublikationsstatusVeröffentlicht - 30 Mai 2024
Extern publiziertJa
Veranstaltung38th ACM International Conference on Supercomputing, ICS 2024 - Kyoto, Japan
Dauer: 4 Juni 20247 Juni 2024

Publikationsreihe

NameProceedings of the International Conference on Supercomputing

Konferenz

Konferenz38th ACM International Conference on Supercomputing, ICS 2024
Land/GebietJapan
OrtKyoto
Zeitraum4/06/247/06/24

Fingerprint

Untersuchen Sie die Forschungsthemen von „sys-sage: A Unified Representation of Dynamic Topologies & Attributes on HPC Systems“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren