TY - GEN
T1 - Load Balancing for Molecular Dynamics Simulations on Heterogeneous Architectures
AU - Seckler, Steffen
AU - Tchipev, Nikola
AU - Bungartz, Hans Joachim
AU - Neumann, Philipp
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/2/1
Y1 - 2017/2/1
N2 - Upcoming exascale compute systems are expected to be built from heterogeneous hardware architectures. According to this trend, there exist various methods to handle clusters composed of CPUs, GPUs or other accelerators. Most of these assume that each node has the same structure - for example a dual socket system with an accelerator (GPU or Xeon Phi). The workload is then distributed homogeneously among the nodes. However, not all clusters fulfill this requirement. They might contain different partitions with and without accelerators. Furthermore, depending on the underlying problem to be solved, accelerator cards may perform better in native mode compared to offloading. Besides, various aspects such as cooling may influence the performance of individual nodes. It therefore cannot always be assumed, that the structure and performance of each node and hence the performance of every MPI rank is the same. In this contribution, we apply a k-d tree decomposition method to balance load on heterogeneous compute clusters. The algorithm is incorporated into the molecular dynamics simulation program ls1 mardyn. We present performance results for simulations executed on hybrid AMD Bulldozer-Intel Sandy Bridge, Intel Westmere-Intel Sandy Bridge and Intel Xeon-Intel Xeon Phi-architectures. The only prerequisite for the proposed algorithm is a cost estimation for different decompositions. It is hence expected to be applicable to a variety of n-body scenarios, for which a domain decomposition is possible.
AB - Upcoming exascale compute systems are expected to be built from heterogeneous hardware architectures. According to this trend, there exist various methods to handle clusters composed of CPUs, GPUs or other accelerators. Most of these assume that each node has the same structure - for example a dual socket system with an accelerator (GPU or Xeon Phi). The workload is then distributed homogeneously among the nodes. However, not all clusters fulfill this requirement. They might contain different partitions with and without accelerators. Furthermore, depending on the underlying problem to be solved, accelerator cards may perform better in native mode compared to offloading. Besides, various aspects such as cooling may influence the performance of individual nodes. It therefore cannot always be assumed, that the structure and performance of each node and hence the performance of every MPI rank is the same. In this contribution, we apply a k-d tree decomposition method to balance load on heterogeneous compute clusters. The algorithm is incorporated into the molecular dynamics simulation program ls1 mardyn. We present performance results for simulations executed on hybrid AMD Bulldozer-Intel Sandy Bridge, Intel Westmere-Intel Sandy Bridge and Intel Xeon-Intel Xeon Phi-architectures. The only prerequisite for the proposed algorithm is a cost estimation for different decompositions. It is hence expected to be applicable to a variety of n-body scenarios, for which a domain decomposition is possible.
KW - AMD Bulldozer
KW - Intel Xeon Phi
KW - heterogeneous
KW - k-d trees
KW - ls1 mardyn
KW - molecular dynamics
UR - http://www.scopus.com/inward/record.url?scp=85015244848&partnerID=8YFLogxK
U2 - 10.1109/HiPC.2016.021
DO - 10.1109/HiPC.2016.021
M3 - Conference contribution
AN - SCOPUS:85015244848
T3 - Proceedings - 23rd IEEE International Conference on High Performance Computing, HiPC 2016
SP - 101
EP - 110
BT - Proceedings - 23rd IEEE International Conference on High Performance Computing, HiPC 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd IEEE International Conference on High Performance Computing, HiPC 2016
Y2 - 19 December 2016 through 22 December 2016
ER -