TY - GEN

T1 - Efficient Distributed Machine Learning via Combinatorial Multi-Armed Bandits

AU - Egger, Maximilian

AU - Bitar, Rawad

AU - Wachter-Zeh, Antonia

AU - Gündüz, Deniz

N1 - Publisher Copyright:
© 2022 IEEE.

PY - 2022

Y1 - 2022

N2 - We consider the distributed stochastic gradient descent problem, where a main node distributes gradient calculations among n workers from which at most b ≤ n can be utilized in parallel. By assigning tasks to all the workers and waiting only for the k fastest ones, the main node can trade-off the error of the algorithm with its runtime by gradually increasing k as the algorithm evolves. However, this strategy, referred to as adaptive k-sync, can incur additional costs since it ignores the computational efforts of slow workers. We propose a cost-efficient scheme that assigns tasks only to k workers and gradually increases k. As the response times of the available workers are unknown to the main node a priori, we utilize a combinatorial multi-armed bandit model to learn which workers are the fastest while assigning gradient calculations, and to minimize the effect of slow workers. Assuming that the mean response times of the workers are independent and exponentially distributed with different means, we give empirical and theoretical guarantees on the regret of our strategy, i.e., the extra time spent to learn the mean response times of the workers. Compared to adaptive k-sync, our scheme achieves significantly lower errors with the same computational efforts while being inferior in terms of speed.

AB - We consider the distributed stochastic gradient descent problem, where a main node distributes gradient calculations among n workers from which at most b ≤ n can be utilized in parallel. By assigning tasks to all the workers and waiting only for the k fastest ones, the main node can trade-off the error of the algorithm with its runtime by gradually increasing k as the algorithm evolves. However, this strategy, referred to as adaptive k-sync, can incur additional costs since it ignores the computational efforts of slow workers. We propose a cost-efficient scheme that assigns tasks only to k workers and gradually increases k. As the response times of the available workers are unknown to the main node a priori, we utilize a combinatorial multi-armed bandit model to learn which workers are the fastest while assigning gradient calculations, and to minimize the effect of slow workers. Assuming that the mean response times of the workers are independent and exponentially distributed with different means, we give empirical and theoretical guarantees on the regret of our strategy, i.e., the extra time spent to learn the mean response times of the workers. Compared to adaptive k-sync, our scheme achieves significantly lower errors with the same computational efforts while being inferior in terms of speed.

UR - http://www.scopus.com/inward/record.url?scp=85136260894&partnerID=8YFLogxK

U2 - 10.1109/ISIT50566.2022.9834499

DO - 10.1109/ISIT50566.2022.9834499

M3 - Conference contribution

AN - SCOPUS:85136260894

T3 - IEEE International Symposium on Information Theory - Proceedings

SP - 1653

EP - 1658

BT - 2022 IEEE International Symposium on Information Theory, ISIT 2022

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2022 IEEE International Symposium on Information Theory, ISIT 2022

Y2 - 26 June 2022 through 1 July 2022

ER -