TY - GEN
T1 - Runtime-guided mitigation of manufacturing variability in power-constrained multi-socket NUMA nodes
AU - Chasapis, Dimitrios
AU - Casas, Marc
AU - Moretó, Miquel
AU - Schulz, Martin
AU - Ayguadé, Eduard
AU - Labarta, Jesus
AU - Valero, Mateo
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/6/1
Y1 - 2016/6/1
N2 - Current large scale systems show increasing power demands, to the point that it has become a huge strain on facilities and budgets. Researchers in academia, labs and industry are focusing on dealing with this "power wall", striving to find a balance between performance and power consumption. Some commodity processors enable power capping, which opens up new opportunities for applications to directly manage their power behavior at user level. However, while power capping ensures a system will never exceed a given power limit, it also leads to a new form of heterogeneity: natural manufacturing variability, which was previously hidden by varying power to achieve homogeneous performance, now results in heterogeneous performance caused by different CPU frequencies, potentially for each core, to enforce the power limit. In this work we show how a parallel runtime system can be used to effectively deal with this new kind of performance heterogeneity by compensating the uneven effects of power capping. In the context of a NUMA node composed of several multi-core sockets, our system is able to optimize the energy and concurrency levels assigned to each socket to maximize performance. Applied transparently within the parallel runtime system, it does not require any programmer interaction like changing the application source code or manually reconfiguring the parallel system. We compare our novel runtime analysis with an offline approach and demonstrate that it can achieve equal performance at a fraction of the cost.
AB - Current large scale systems show increasing power demands, to the point that it has become a huge strain on facilities and budgets. Researchers in academia, labs and industry are focusing on dealing with this "power wall", striving to find a balance between performance and power consumption. Some commodity processors enable power capping, which opens up new opportunities for applications to directly manage their power behavior at user level. However, while power capping ensures a system will never exceed a given power limit, it also leads to a new form of heterogeneity: natural manufacturing variability, which was previously hidden by varying power to achieve homogeneous performance, now results in heterogeneous performance caused by different CPU frequencies, potentially for each core, to enforce the power limit. In this work we show how a parallel runtime system can be used to effectively deal with this new kind of performance heterogeneity by compensating the uneven effects of power capping. In the context of a NUMA node composed of several multi-core sockets, our system is able to optimize the energy and concurrency levels assigned to each socket to maximize performance. Applied transparently within the parallel runtime system, it does not require any programmer interaction like changing the application source code or manually reconfiguring the parallel system. We compare our novel runtime analysis with an offline approach and demonstrate that it can achieve equal performance at a fraction of the cost.
KW - High performance computing
KW - Manufacturing variability
KW - Parallel architectures
KW - Parallel programming
KW - Pararallel runtimes
KW - Power and energy
UR - http://www.scopus.com/inward/record.url?scp=84978540193&partnerID=8YFLogxK
U2 - 10.1145/2925426.2926279
DO - 10.1145/2925426.2926279
M3 - Conference contribution
AN - SCOPUS:84978540193
T3 - Proceedings of the International Conference on Supercomputing
BT - Proceedings of the 2016 International Conference on Supercomputing, ICS 2016
PB - Association for Computing Machinery
T2 - 30th International Conference on Supercomputing, ICS 2016
Y2 - 1 June 2016 through 3 June 2016
ER -