TY - GEN
T1 - Overcoming Weak Scaling Challenges in Tree-Based Nearest Neighbor Time Series Mining
AU - Raoofy, Amir
AU - Karlstetter, Roman
AU - Schreiber, Martin
AU - Trinitis, Carsten
AU - Schulz, Martin
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - The mining of time series data plays an important role in modern information retrieval and monitoring infrastructures. In particular, the identification of similarities within and across large time series is of great importance in analytics and knowledge discovery. For this task, the matrix profile similarity indexing approach, which encodes the correlations among snapshots of a time series, is well-established. However, it is computationally expensive, especially for long time series, as existing exact approaches mostly rely on exhaustive, exact query (search) operations and are inefficient. Similarly, existing approximate approaches are limited with respect to parallelism, scalability, or their extent of practicality. We, therefore, focus on an approximate parallel tree-based nearest-neighbors approach and address the weak scaling challenges raised when applied to large time series in HPC settings. We build on the existing concept of parallel iterative tree-based nearest neighbor solvers and introduce a novel approach for the approximate calculation of the matrix profile. To improve the performance and overcome weak scalability challenges, we exploit a mix of creating a forest of parallel trees on exclusive ensembles of resources combined with pipelining of iterations. We provide an implementation targeting large-scale CPU-based HPC systems and illustrate the performance of this new approach with experimental data. Finally, we demonstrate the mining of time series at billion-records-scale datasets on the SuperMUC-NG system.
AB - The mining of time series data plays an important role in modern information retrieval and monitoring infrastructures. In particular, the identification of similarities within and across large time series is of great importance in analytics and knowledge discovery. For this task, the matrix profile similarity indexing approach, which encodes the correlations among snapshots of a time series, is well-established. However, it is computationally expensive, especially for long time series, as existing exact approaches mostly rely on exhaustive, exact query (search) operations and are inefficient. Similarly, existing approximate approaches are limited with respect to parallelism, scalability, or their extent of practicality. We, therefore, focus on an approximate parallel tree-based nearest-neighbors approach and address the weak scaling challenges raised when applied to large time series in HPC settings. We build on the existing concept of parallel iterative tree-based nearest neighbor solvers and introduce a novel approach for the approximate calculation of the matrix profile. To improve the performance and overcome weak scalability challenges, we exploit a mix of creating a forest of parallel trees on exclusive ensembles of resources combined with pipelining of iterations. We provide an implementation targeting large-scale CPU-based HPC systems and illustrate the performance of this new approach with experimental data. Finally, we demonstrate the mining of time series at billion-records-scale datasets on the SuperMUC-NG system.
UR - http://www.scopus.com/inward/record.url?scp=85161169092&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-32041-5_17
DO - 10.1007/978-3-031-32041-5_17
M3 - Conference contribution
AN - SCOPUS:85161169092
SN - 9783031320408
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 317
EP - 338
BT - High Performance Computing - 38th International Conference, ISC High Performance 2023, Proceedings
A2 - Bhatele, Abhinav
A2 - Hammond, Jeff
A2 - Baboulin, Marc
A2 - Kruse, Carola
PB - Springer Science and Business Media Deutschland GmbH
T2 - 38th International Conference on High Performance Computing, ISC High Performance 2023
Y2 - 21 May 2023 through 25 May 2023
ER -