Abstract
While there has been substantial recent work studying generalization of neural networks, the ability of deep networks in automating the process of feature extraction still evades a thorough mathematical understanding. As a step toward this goal, we analyze learning and generalization of a three-layer neural network with ReLU activations in a regime that goes beyond the linear approximation of the network and is hence not captured by the common Neural Tangent Kernel. We show that despite nonconvexity of the empirical loss, a variant of SGD converges in polynomially many iterations to a good solution that generalizes. In particular, our generalization bounds are adaptive: they automatically optimize over a family of kernels that includes the Neural Tangent Kernel to provide the tightest bound.
Original language | English |
---|---|
State | Published - 2022 |
Externally published | Yes |
Event | 10th International Conference on Learning Representations, ICLR 2022 - Virtual, Online Duration: 25 Apr 2022 → 29 Apr 2022 |
Conference
Conference | 10th International Conference on Learning Representations, ICLR 2022 |
---|---|
City | Virtual, Online |
Period | 25/04/22 → 29/04/22 |