TY - GEN
T1 - Memory usage optimizations for online event analysis
AU - Hilbrich, Tobias
AU - Protze, Joachim
AU - Wagner, Michael
AU - Müller, Matthias S.
AU - Schulz, Martin
AU - de Supinski, Bronis R.
AU - Nagel, Wolfgang E.
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Tools are essential for application developers and system support personnel during tasks such as performance optimization and debugging of massively parallel applications. An important class are event-based tools that analyze relevant events during the runtime of an application, e.g., function invocations or communication operations. We develop a parallel tools infrastructure that supports both the observation and analysis of application events at runtime. Some analyses—e.g., deadlock detection algorithms—require complex processing and apply to many types of frequently occurring events. For situations where the rate at which an application generates new events exceeds the processing rate of the analysis, we experience tool instability or even failures, e.g., memory exhaustion. Tool infrastructures must provide means to avoid or mitigate such situations. This paper explores two such techniques: first, a heuristic that selects events to receive and process next; second, a pause mechanism that temporarily suspends the execution of an application. An application study with applications from the SPEC MPI2007 benchmark suite and the NAS parallel benchmarks evaluates these techniques at up to 16,384 processes and illustrates how they avoid memory exhaustion problems that limited the applicability of a runtime correctness tool in the past.
AB - Tools are essential for application developers and system support personnel during tasks such as performance optimization and debugging of massively parallel applications. An important class are event-based tools that analyze relevant events during the runtime of an application, e.g., function invocations or communication operations. We develop a parallel tools infrastructure that supports both the observation and analysis of application events at runtime. Some analyses—e.g., deadlock detection algorithms—require complex processing and apply to many types of frequently occurring events. For situations where the rate at which an application generates new events exceeds the processing rate of the analysis, we experience tool instability or even failures, e.g., memory exhaustion. Tool infrastructures must provide means to avoid or mitigate such situations. This paper explores two such techniques: first, a heuristic that selects events to receive and process next; second, a pause mechanism that temporarily suspends the execution of an application. An application study with applications from the SPEC MPI2007 benchmark suite and the NAS parallel benchmarks evaluates these techniques at up to 16,384 processes and illustrates how they avoid memory exhaustion problems that limited the applicability of a runtime correctness tool in the past.
UR - http://www.scopus.com/inward/record.url?scp=84928898083&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-15976-8_8
DO - 10.1007/978-3-319-15976-8_8
M3 - Conference contribution
AN - SCOPUS:84928898083
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 110
EP - 121
BT - Solving Software Challenges for Exascale - International Conference on Exascale Applications and Software, EASC 2014, Revised Selected Papers
A2 - Markidis, Stefano
A2 - Laure, Erwin
PB - Springer Verlag
T2 - International Conference on Exascale Applications and Software, EASC 2014
Y2 - 2 April 2014 through 3 April 2014
ER -