TY - GEN
T1 - Three Operator Splitting with Subgradients, Stochastic Gradients, and Adaptive Learning Rates
AU - Yurtsever, Alp
AU - Gu, Alex
AU - Sra, Suvrit
N1 - Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Three Operator Splitting (TOS) (Davis & Yin, 2017) can minimize the sum of multiple convex functions effectively when an efficient gradient oracle or proximal operator is available for each term. This requirement often fails in machine learning applications: (i) instead of full gradients only stochastic gradients may be available; and (ii) instead of proximal operators, using subgradients to handle complex penalty functions may be more efficient and realistic. Motivated by these concerns, we analyze three potentially valuable extensions of TOS. The first two permit using subgradients and stochastic gradients, and are shown to ensure a O(1/ √ t) convergence rate. The third extension ADAPTOS endows TOS with adaptive stepsizes. For the important setting of optimizing a convex loss over the intersection of convex sets ADAPTOS attains universal convergence rates, i.e., the rate adapts to the unknown smoothness degree of the objective function. We compare our proposed methods with competing methods on various applications.
AB - Three Operator Splitting (TOS) (Davis & Yin, 2017) can minimize the sum of multiple convex functions effectively when an efficient gradient oracle or proximal operator is available for each term. This requirement often fails in machine learning applications: (i) instead of full gradients only stochastic gradients may be available; and (ii) instead of proximal operators, using subgradients to handle complex penalty functions may be more efficient and realistic. Motivated by these concerns, we analyze three potentially valuable extensions of TOS. The first two permit using subgradients and stochastic gradients, and are shown to ensure a O(1/ √ t) convergence rate. The third extension ADAPTOS endows TOS with adaptive stepsizes. For the important setting of optimizing a convex loss over the intersection of convex sets ADAPTOS attains universal convergence rates, i.e., the rate adapts to the unknown smoothness degree of the objective function. We compare our proposed methods with competing methods on various applications.
UR - http://www.scopus.com/inward/record.url?scp=85131876010&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85131876010
T3 - Advances in Neural Information Processing Systems
SP - 19743
EP - 19756
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
PB - Neural information processing systems foundation
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -