TY - GEN
T1 - Distributed Order Recording Techniques for Efficient Record-and-Replay of Multi - Threaded Programs
AU - Fu, Xiang
AU - Meng, Shiman
AU - Zhang, Weiping
AU - Guo, Luanzheng
AU - Sato, Kento
AU - Ahn, Dong H.
AU - Laguna, Ignacio
AU - Lee, Gregory L.
AU - Schulz, Martin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - After all these years and all these other shared memory programming frameworks, OpenMP is still the most popular one. However, its greater levels of non-deterministic execution makes debugging and testing more challenging. The ability to record and deterministically replay the program execution is key to address this challenge. However, scalably replaying OpenMP programs is still an unresolved problem. In this paper, we propose two novel techniques that use Distributed Clock (DC) and Distributed Epoch (DE) recording schemes to eliminate excessive thread synchronization for OpenMP record and replay. Our evaluation on representative HPC applications with ReOMP, which we used to realize DC and DE recording, shows that our approach is 2-5x more efficient than traditional approaches that synchronize on every shared-memory access. Furthermore, we demonstrate that our approach can be easily combined with MPI-Ievel replay tools to replay non-trivial MPI+OpenMP applications. We achieve this by integrating ReOMP into ReMPI, an existing scalable MPI record-and-replay tool, with only a small MPI-scale-independent runtime overhead.
AB - After all these years and all these other shared memory programming frameworks, OpenMP is still the most popular one. However, its greater levels of non-deterministic execution makes debugging and testing more challenging. The ability to record and deterministically replay the program execution is key to address this challenge. However, scalably replaying OpenMP programs is still an unresolved problem. In this paper, we propose two novel techniques that use Distributed Clock (DC) and Distributed Epoch (DE) recording schemes to eliminate excessive thread synchronization for OpenMP record and replay. Our evaluation on representative HPC applications with ReOMP, which we used to realize DC and DE recording, shows that our approach is 2-5x more efficient than traditional approaches that synchronize on every shared-memory access. Furthermore, we demonstrate that our approach can be easily combined with MPI-Ievel replay tools to replay non-trivial MPI+OpenMP applications. We achieve this by integrating ReOMP into ReMPI, an existing scalable MPI record-and-replay tool, with only a small MPI-scale-independent runtime overhead.
KW - Non-determinism
KW - OpenMP
KW - Record-and-Replay
KW - Reproducibility
UR - http://www.scopus.com/inward/record.url?scp=85211922581&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER59578.2024.00010
DO - 10.1109/CLUSTER59578.2024.00010
M3 - Conference contribution
AN - SCOPUS:85211922581
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 27
EP - 38
BT - Proceedings - 2024 IEEE International Conference on Cluster Computing, CLUSTER 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE International Conference on Cluster Computing, CLUSTER 2024
Y2 - 24 September 2024 through 27 September 2024
ER -