Higher Order Statistical Decorrelation without Information Loss

Gustavo Deco, Wilfried Brauer

Research output: Contribution to conferencePaperpeer-review

20 Scopus citations

Abstract

A neural network learning paradigm based on information theory is proposed as a way to perform in an unsupervised fashion, redundancy reduction among the elements of the output layer without loss of information from the sensory input. The model developed performs nonlinear decorrelation up to higher orders of the cumulant tensors and results in probabilistically independent components of the output layer. This means that we don't need to assume Gaussian distribution neither at the input nor at the output. The theory presented is related to the unsupervised-learning theory of Barlow, which proposes redundancy reduction as the goal of cognition. When nonlinear units are used nonlinear principal component analysis is obtained. In this case nonlinear manifolds can be reduced to minimum dimension manifolds. If such units are used the network performs a generalized principal component analysis in the sense that non-Gaussian distributions can be linearly decorrelated and higher orders of the correlation tensors are also taken into account. The basic structure of the architecture involves a general transformation that is volume conserving and therefore the entropy, yielding a map without loss of information. Minimization of the mutual information among the output neurons eliminates the redundancy between the outputs and results in statistical decorrelation of the extracted features. This is known as factorial learning.

Original languageEnglish
Pages247-254
Number of pages8
StatePublished - 1994
Event7th International Conference on Neural Information Processing Systems, NIPS 1994 - Denver, United States
Duration: 1 Jan 19941 Jan 1994

Conference

Conference7th International Conference on Neural Information Processing Systems, NIPS 1994
Country/TerritoryUnited States
CityDenver
Period1/01/941/01/94

Fingerprint

Dive into the research topics of 'Higher Order Statistical Decorrelation without Information Loss'. Together they form a unique fingerprint.

Cite this