TY - GEN
T1 - Leveraging CPU-FPGA Co-design for Matrix Profile Computation
AU - Huseynli, Fariz
AU - Raoofy, Amir
AU - Schulz, Martin
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Current technology trends in high-performance computing (HPC) are pushing us towards accelerated systems. While GPU-based systems are the most common option, not all applications work well on such architectures. Solutions, like programmable hardware in the form of FPGAs (Field Programmable Gate Arrays), can be a powerful alternative. However, the complexity of developing specialized computing units in FPGAs, which are optimized for a specific task, often limits their broad utilization. In this paper, we follow a co-design methodology to identify the key computational routines and to replace them by using user-friendly libraries that wrap complex FPGA access mechanisms. This simplifies the usage of specialized compute units in FPGAs. To demonstrate our approach, we focus on performance improvements for an HPC/BigData application called (MP)N, which is built around widely used data analytics algorithm computing the matrix profile for multidimensional time series. In this application, we identify a sorting kernel as one of the key time consumers and accelerate it designing a parallel sorting library and using it to offload sorting batches to the FPGA. At the same time, we enable efficient utilization of CPU resources through overlap and pipelining. We achieve a 2-fold run time improvement for computing a 128-dimensional time series of 7 million records, with the performance gap increasing as the number of records grows, highlighting the potential of CPU-FPGA co-design in HPC.
AB - Current technology trends in high-performance computing (HPC) are pushing us towards accelerated systems. While GPU-based systems are the most common option, not all applications work well on such architectures. Solutions, like programmable hardware in the form of FPGAs (Field Programmable Gate Arrays), can be a powerful alternative. However, the complexity of developing specialized computing units in FPGAs, which are optimized for a specific task, often limits their broad utilization. In this paper, we follow a co-design methodology to identify the key computational routines and to replace them by using user-friendly libraries that wrap complex FPGA access mechanisms. This simplifies the usage of specialized compute units in FPGAs. To demonstrate our approach, we focus on performance improvements for an HPC/BigData application called (MP)N, which is built around widely used data analytics algorithm computing the matrix profile for multidimensional time series. In this application, we identify a sorting kernel as one of the key time consumers and accelerate it designing a parallel sorting library and using it to offload sorting batches to the FPGA. At the same time, we enable efficient utilization of CPU resources through overlap and pipelining. We achieve a 2-fold run time improvement for computing a 128-dimensional time series of 7 million records, with the performance gap increasing as the number of records grows, highlighting the potential of CPU-FPGA co-design in HPC.
KW - CPU-FPGA Co-Design
KW - HPC
KW - Time Series Mining
UR - http://www.scopus.com/inward/record.url?scp=85219178372&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-80084-9_9
DO - 10.1007/978-3-031-80084-9_9
M3 - Conference contribution
AN - SCOPUS:85219178372
SN - 9783031800832
T3 - Communications in Computer and Information Science
SP - 127
EP - 141
BT - High Performance Computing - 11th Latin American High Performance Computing Conference, CARLA 2024, Revised Selected Papers
A2 - Guerrero, Ginés
A2 - San Martín, Jaime
A2 - Meneses, Esteban
A2 - Barrios Hernández, Carlos Jaime
A2 - Osthoff, Carla
A2 - Monsalve Diaz, Jose M.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 11th Latin American High Performance Computing Conference, CARLA 2024
Y2 - 30 September 2024 through 4 October 2024
ER -