SGD with shuffling: Optimal rates without component convexity and large epoch requirements

Research output: Contribution to journalConference articlepeer-review

54 Scopus citations

Abstract

We study without-replacement SGD for solving finite-sum optimization problems. Specifically, depending on how the indices of the finite-sum are shuffled, we consider the RANDOMSHUFFLE (shuffle at the beginning of each epoch) and SINGLESHUFFLE (shuffle only once) algorithms. First, we establish minimax optimal convergence rates of these algorithms up to poly-log factors. Notably, our analysis is general enough to cover gradient dominated nonconvex costs, and does not rely on the convexity of individual component functions unlike existing optimal convergence results. Secondly, assuming convexity of the individual components, we further sharpen the tight convergence results for RANDOMSHUFFLE by removing the drawbacks common to all prior arts: large number of epochs required for the results to hold, and extra poly-log factor gaps to the lower bound.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume2020-December
StatePublished - 2020
Externally publishedYes
Event34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online
Duration: 6 Dec 202012 Dec 2020

Fingerprint

Dive into the research topics of 'SGD with shuffling: Optimal rates without component convexity and large epoch requirements'. Together they form a unique fingerprint.

Cite this