TY - GEN
T1 - Complexity of finding stationary points of nonsmooth nonconvex functions
AU - Zhang, Jingzhao
AU - Lin, Hongzhou
AU - Jegelka, Stefanie
AU - Sra, Suvrit
AU - Jadbabaie, Ali
N1 - Publisher Copyright:
© 2020 by the Authors All rights reserved.
PY - 2020
Y1 - 2020
N2 - We provide the first non-Asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains examples such as ReLU neural networks and others with non-differentiable activation functions. We first show that finding an-stationary point with first-order methods is impossible in finite time. We then introduce the notion of (;)-stationarity, which allows for an-Approximate gradient to be the convex combination of generalized gradients evaluated at points within distance to the solution. We propose a series of randomized first-order methods and analyze their complexity of finding a (;)-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on. Empirically, our methods perform well for training ReLU neural networks.
AB - We provide the first non-Asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains examples such as ReLU neural networks and others with non-differentiable activation functions. We first show that finding an-stationary point with first-order methods is impossible in finite time. We then introduce the notion of (;)-stationarity, which allows for an-Approximate gradient to be the convex combination of generalized gradients evaluated at points within distance to the solution. We propose a series of randomized first-order methods and analyze their complexity of finding a (;)-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on. Empirically, our methods perform well for training ReLU neural networks.
UR - http://www.scopus.com/inward/record.url?scp=85104651148&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85104651148
T3 - 37th International Conference on Machine Learning, ICML 2020
SP - 11107
EP - 11116
BT - 37th International Conference on Machine Learning, ICML 2020
A2 - Daume, Hal
A2 - Singh, Aarti
PB - International Machine Learning Society (IMLS)
T2 - 37th International Conference on Machine Learning, ICML 2020
Y2 - 13 July 2020 through 18 July 2020
ER -