Sharp Analysis of Stochastic Optimization under Global Kurdyka-Łojasiewicz Inequality

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Scopus citations

Abstract

We study the complexity of finding the global solution to stochastic nonconvex optimization when the objective function satisfies global Kurdyka-Łojasiewicz (KŁ) inequality and the queries from stochastic gradient oracles satisfy mild expected smoothness assumption. We first introduce a general framework to analyze Stochastic Gradient Descent (SGD) and its associated nonlinear dynamics under the setting. As a byproduct of our analysis, we obtain a sample complexity of O(ϵ−(4−α)) for SGD when the objective satisfies the so called α-PŁ condition, where α is the degree of gradient domination. Furthermore, we show that a modified SGD with variance reduction and restarting (PAGER) achieves an improved sample complexity of O(ϵ−2/α) when the objective satisfies the average smoothness assumption. This leads to the first optimal algorithm for the important case of α = 1 which appears in applications such as policy optimization in reinforcement learning.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022
EditorsS. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, A. Oh
PublisherNeural information processing systems foundation
ISBN (Electronic)9781713871088
StatePublished - 2022
Externally publishedYes
Event36th Conference on Neural Information Processing Systems, NeurIPS 2022 - New Orleans, United States
Duration: 28 Nov 20229 Dec 2022

Publication series

NameAdvances in Neural Information Processing Systems
Volume35
ISSN (Print)1049-5258

Conference

Conference36th Conference on Neural Information Processing Systems, NeurIPS 2022
Country/TerritoryUnited States
CityNew Orleans
Period28/11/229/12/22

Fingerprint

Dive into the research topics of 'Sharp Analysis of Stochastic Optimization under Global Kurdyka-Łojasiewicz Inequality'. Together they form a unique fingerprint.

Cite this