Project Details
Description
Learned deep neural networks generalize very well despite being trained with a number of training samples that is significantly lower than the number of parameters. This surprising phenomenon goes against traditional wisdom which attributes overfitting with poor generalization. In such overparametrized setting the loss functional possesses many global minima corresponding to neural networks that interpolate the data, and the learning algorithm induces an implicit bias towards certain favored solutions. By the principle of Occam's razor, it can be anticipated that good generalization is connected with networks of low complexity and it seems that the standard algorithm of (stochastic) gradient descent favors networks whose complexity is much lower than suggested by the number of parameters. In this project we aim to advance recent first results, which show, for the simple case of deep linear networks, that training via gradient descent promotes implicit bias towards network weights whose product is a low rank matrix. We intend to significantly extend theory in this direction and exploit this mechanism for the reliable solution of low rank matrix recovery problems, such as matrix completion. We further aim at contributing to the theoretical foundations of the implicit bias of gradient descent and its stochastic variants for learning deep nonlinear networks. Evidence suggests that the bias is again towards low complexity networks, whose nature we intend to explore. We leverage the intrinsic low complexity of trained nonlinear networks to design novel algorithms for their compression. In particular, we aim at extending recent results to deep networks, which relate approximated second order network differentials to certain non-orthogonal rank one decompositions encoding optimal weights. We plan to prove that the optimal weights can be stably and reliably computed. As a byproduct we will show robust and unique identification of generic deep networks from a minimal number of samples. Besides advancing on the theoretical level, the project will develop new algorithms and software of practical relevance for machine learning, solution of inverse problems and compression of neural networks for their use on mobile devices.
Status | Active |
---|---|
Effective start/end date | 1/01/21 → … |