TY - JOUR
T1 - How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental Study
AU - Erben, Alexander
AU - Mayer, Ruben
AU - Jacobsen, Hans Arno
N1 - Publisher Copyright:
© 2024, VLDB Endowment. All rights reserved.
PY - 2024
Y1 - 2024
N2 - This paper aims to answer the question: Can deep learning models be cost-efficiently trained on a global market of spot VMs spanning different data centers and cloud providers? To provide guidance, we extensively evaluate the cost and throughput implications of training in different zones, continents, and clouds for representative CV, NLP and ASR models. To expand the current training options further, we compare the scalability potential for hybrid-cloud scenarios by adding cloud resources to on-premise hardware to improve training throughput. Finally, we show how leveraging spot instance pricing enables a new cost-efficient way to train models with multiple cheap VMs, trumping both more centralized and powerful hardware and even on-demand cloud offerings at competitive prices.
AB - This paper aims to answer the question: Can deep learning models be cost-efficiently trained on a global market of spot VMs spanning different data centers and cloud providers? To provide guidance, we extensively evaluate the cost and throughput implications of training in different zones, continents, and clouds for representative CV, NLP and ASR models. To expand the current training options further, we compare the scalability potential for hybrid-cloud scenarios by adding cloud resources to on-premise hardware to improve training throughput. Finally, we show how leveraging spot instance pricing enables a new cost-efficient way to train models with multiple cheap VMs, trumping both more centralized and powerful hardware and even on-demand cloud offerings at competitive prices.
UR - http://www.scopus.com/inward/record.url?scp=85190671555&partnerID=8YFLogxK
U2 - 10.14778/3648160.3648165
DO - 10.14778/3648160.3648165
M3 - Conference article
AN - SCOPUS:85190671555
SN - 2150-8097
VL - 17
SP - 1214
EP - 1226
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 6
T2 - 50th International Conference on Very Large Data Bases, VLDB 2024
Y2 - 25 August 2024 through 29 August 2024
ER -