Complexity of finding stationary points of nonsmooth nonconvex functions

Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Suvrit Sra, Ali Jadbabaie

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

We provide the first non-Asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains examples such as ReLU neural networks and others with non-differentiable activation functions. We first show that finding an-stationary point with first-order methods is impossible in finite time. We then introduce the notion of (;)-stationarity, which allows for an-Approximate gradient to be the convex combination of generalized gradients evaluated at points within distance to the solution. We propose a series of randomized first-order methods and analyze their complexity of finding a (;)-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on. Empirically, our methods perform well for training ReLU neural networks.

Original languageEnglish
Title of host publication37th International Conference on Machine Learning, ICML 2020
EditorsHal Daume, Aarti Singh
PublisherInternational Machine Learning Society (IMLS)
Pages11107-11116
Number of pages10
ISBN (Electronic)9781713821120
StatePublished - 2020
Externally publishedYes
Event37th International Conference on Machine Learning, ICML 2020 - Virtual, Online
Duration: 13 Jul 202018 Jul 2020

Publication series

Name37th International Conference on Machine Learning, ICML 2020
VolumePartF168147-15

Conference

Conference37th International Conference on Machine Learning, ICML 2020
CityVirtual, Online
Period13/07/2018/07/20

Fingerprint

Dive into the research topics of 'Complexity of finding stationary points of nonsmooth nonconvex functions'. Together they form a unique fingerprint.

Cite this