TY - GEN
T1 - Flux
T2 - 43rd International Conference on Parallel Processing Workshops, ICPPW 2014
AU - Ahn, Dong H.
AU - Garlick, Jim
AU - Grondona, Mark
AU - Lipari, Don
AU - Springmeyer, Becky
AU - Schulz, Martin
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2015/5/7
Y1 - 2015/5/7
N2 - Resource and job management software is crucial to High Performance Computing (HPC) for efficient application execution. However, current systems and approaches can no longer keep up with the challenges large HPC centers are facing due to ever-increasing system scales, resource and workload diversity, interplays between various resources (e.g., between compute clusters and a global file system), and complexity of resource constraints such as strict power budgeting. To address this gap, we propose Flux, an extensible job and resource management framework specifically designed to deal with the requirements of next-generation HPC centers. Flux targets an entire computing facility as one common pool of diverse sets of resources, enabling the facility to accommodate site-wide constraints (e.g., for power limits). Yet, its scalable and distributed design still offers scalable and effective scheduling strategies. This paper details the design of Flux and describes and evaluates our initial prototyping effort of the key run-time components. Our results show that our run-time prototype provides strong and predictable scalability.
AB - Resource and job management software is crucial to High Performance Computing (HPC) for efficient application execution. However, current systems and approaches can no longer keep up with the challenges large HPC centers are facing due to ever-increasing system scales, resource and workload diversity, interplays between various resources (e.g., between compute clusters and a global file system), and complexity of resource constraints such as strict power budgeting. To address this gap, we propose Flux, an extensible job and resource management framework specifically designed to deal with the requirements of next-generation HPC centers. Flux targets an entire computing facility as one common pool of diverse sets of resources, enabling the facility to accommodate site-wide constraints (e.g., for power limits). Yet, its scalable and distributed design still offers scalable and effective scheduling strategies. This paper details the design of Flux and describes and evaluates our initial prototyping effort of the key run-time components. Our results show that our run-time prototype provides strong and predictable scalability.
KW - communication framework
KW - key value store
KW - resource management
KW - run-time
KW - scalable process management services
UR - http://www.scopus.com/inward/record.url?scp=84946563038&partnerID=8YFLogxK
U2 - 10.1109/ICPPW.2014.15
DO - 10.1109/ICPPW.2014.15
M3 - Conference contribution
AN - SCOPUS:84946563038
T3 - Proceedings of the International Conference on Parallel Processing Workshops
SP - 9
EP - 17
BT - Proceedings - 43rd International Conference on Parallel Processing Workshops, ICPPW 2014
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 September 2014 through 12 September 2014
ER -