ADAPTIVE GENERALIZATION AND OPTIMIZATION OF THREE-LAYER NEURAL NETWORKS

Khashayar Gatmiry, Stefanie Jegelka, Jonathan Kelner

Research output: Contribution to conferencePaperpeer-review

Abstract

While there has been substantial recent work studying generalization of neural networks, the ability of deep networks in automating the process of feature extraction still evades a thorough mathematical understanding. As a step toward this goal, we analyze learning and generalization of a three-layer neural network with ReLU activations in a regime that goes beyond the linear approximation of the network and is hence not captured by the common Neural Tangent Kernel. We show that despite nonconvexity of the empirical loss, a variant of SGD converges in polynomially many iterations to a good solution that generalizes. In particular, our generalization bounds are adaptive: they automatically optimize over a family of kernels that includes the Neural Tangent Kernel to provide the tightest bound.

Original languageEnglish
StatePublished - 2022
Externally publishedYes
Event10th International Conference on Learning Representations, ICLR 2022 - Virtual, Online
Duration: 25 Apr 202229 Apr 2022

Conference

Conference10th International Conference on Learning Representations, ICLR 2022
CityVirtual, Online
Period25/04/2229/04/22

Fingerprint

Dive into the research topics of 'ADAPTIVE GENERALIZATION AND OPTIMIZATION OF THREE-LAYER NEURAL NETWORKS'. Together they form a unique fingerprint.

Cite this