TY - GEN
T1 - Decoupled load balancing
AU - Pearce, Olga
AU - Gamblin, Todd
AU - De Supinski, Bronis R.
AU - Schulz, Martin
AU - Amato, Nancy M.
PY - 2015/1/24
Y1 - 2015/1/24
N2 - Modern scientific simulations divide work between parallel processors by decomposing a spatial domain of mesh cells, particles, or other elements. A balanced assignment of the computational load is critical for parallel performance. If the computation per element changes over the simulation time, simulations can use dynamic load balance algorithms to evenly redistribute work to processes. Graph partitioners are widely used and balance very effectively, but they do not strong scale well. Typical SPMD simulations wait while a load balance algorithm runs on all processors, so a poorly scaling algorithm can itself become a bottleneck. We observe that the load balance algorithm is separate from the main application computation and has its own scaling properties. We propose to decouple the load balance algorithm from the application, and to offload the load balance computation so that it runs concurrently with the application on a smaller number of processors. We demonstrate the costs of decoupling and offloading the load balancing algorithm from a Barnes-Hut application.
AB - Modern scientific simulations divide work between parallel processors by decomposing a spatial domain of mesh cells, particles, or other elements. A balanced assignment of the computational load is critical for parallel performance. If the computation per element changes over the simulation time, simulations can use dynamic load balance algorithms to evenly redistribute work to processes. Graph partitioners are widely used and balance very effectively, but they do not strong scale well. Typical SPMD simulations wait while a load balance algorithm runs on all processors, so a poorly scaling algorithm can itself become a bottleneck. We observe that the load balance algorithm is separate from the main application computation and has its own scaling properties. We propose to decouple the load balance algorithm from the application, and to offload the load balance computation so that it runs concurrently with the application on a smaller number of processors. We demonstrate the costs of decoupling and offloading the load balancing algorithm from a Barnes-Hut application.
KW - Load balance
KW - Parallel algorithm
KW - Performance
UR - http://www.scopus.com/inward/record.url?scp=84939168410&partnerID=8YFLogxK
U2 - 10.1145/2688500.2688539
DO - 10.1145/2688500.2688539
M3 - Conference contribution
AN - SCOPUS:84939168410
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
SP - 267
EP - 268
BT - 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015 - Proceedings
PB - Association for Computing Machinery
T2 - 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015
Y2 - 7 February 2015 through 11 February 2015
ER -