Implicit Bias and Low Complexity Networks (iLOCO)

Project: Research

Project Details

Description

Learned deep neural networks generalize very well despite being trained with a number of training samples that is significantly lower than the number of parameters. This surprising phenomenon goes against traditional wisdom which attributes overfitting with poor generalization. In such overparametrized setting the loss functional possesses many global minima corresponding to neural networks that interpolate the data, and the learning algorithm induces an implicit bias towards certain favored solutions. By the principle of Occam's razor, it can be anticipated that good generalization is connected with networks of low complexity and it seems that the standard algorithm of (stochastic) gradient descent favors networks whose complexity is much lower than suggested by the number of parameters. In this project we aim to advance recent first results, which show, for the simple case of deep linear networks, that training via gradient descent promotes implicit bias towards network weights whose product is a low rank matrix. We intend to significantly extend theory in this direction and exploit this mechanism for the reliable solution of low rank matrix recovery problems, such as matrix completion. We further aim at contributing to the theoretical foundations of the implicit bias of gradient descent and its stochastic variants for learning deep nonlinear networks. Evidence suggests that the bias is again towards low complexity networks, whose nature we intend to explore. We leverage the intrinsic low complexity of trained nonlinear networks to design novel algorithms for their compression. In particular, we aim at extending recent results to deep networks, which relate approximated second order network differentials to certain non-orthogonal rank one decompositions encoding optimal weights. We plan to prove that the optimal weights can be stably and reliably computed. As a byproduct we will show robust and unique identification of generic deep networks from a minimal number of samples. Besides advancing on the theoretical level, the project will develop new algorithms and software of practical relevance for machine learning, solution of inverse problems and compression of neural networks for their use on mobile devices.

StatusActive
Effective start/end date1/01/21 → …

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.