Small ReLU networks are powerful memorizers: A tight analysis of memorization capacity

Chulhee Yun, Suvrit Sra, Ali Jadbabaie

Publikation: Beitrag in FachzeitschriftKonferenzartikelBegutachtung

58 Zitate (Scopus)

Abstract

We study finite sample expressivity, i.e., memorization power of ReLU networks. Recent results require N hidden nodes to memorize/interpolate arbitrary N data points. In contrast, by exploiting depth, we show that 3-layer ReLU networks with ?(vN) hidden nodes can perfectly memorize most datasets with N points. We also prove that width T(vN) is necessary and sufficient for memorizing N data points, proving tight bounds on memorization capacity. The sufficiency result can be extended to deeper networks; we show that an L-layer network with W parameters in the hidden layers can memorize N data points if W = ?(N). Combined with a recent upper bound O(WLlog W) on VC dimension, our construction is nearly tight for any fixed L. Subsequently, we analyze memorization capacity of residual networks under a general position assumption; we prove results that substantially reduce the known requirement of N hidden nodes. Finally, we study the dynamics of stochastic gradient descent (SGD), and show that when initialized near a memorizing global minimum of the empirical risk, SGD quickly finds a nearby point with much smaller empirical risk.

OriginalspracheEnglisch
FachzeitschriftAdvances in Neural Information Processing Systems
Jahrgang32
PublikationsstatusVeröffentlicht - 2019
Extern publiziertJa
Veranstaltung33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Kanada
Dauer: 8 Dez. 201914 Dez. 2019

Fingerprint

Untersuchen Sie die Forschungsthemen von „Small ReLU networks are powerful memorizers: A tight analysis of memorization capacity“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren