Sparse autoencoder-based feature transfer learning for speech emotion recognition

Jun Deng, Zixing Zhang, Erik Marchi, Björn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

349 Scopus citations

Abstract

In speech emotion recognition, training and test data used for system development usually tend to fit each other perfectly, but further 'similar' data may be available. Transfer learning helps to exploit such similar data for training despite the inherent dissimilarities in order to boost a recogniser's performance. In this context, this paper presents a sparse auto encoder method for feature transfer learning for speech emotion recognition. In our proposed method, a common emotion-specific mapping rule is learnt from a small set of labelled data in a target domain. Then, newly reconstructed data are obtained by applying this rule on the emotion-specific data in a different domain. The experimental results evaluated on six standard databases show that our approach significantly improves the performance relative to learning each source domain independently.

Original languageEnglish
Title of host publicationProceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013
Pages511-516
Number of pages6
DOIs
StatePublished - 2013
Event2013 5th Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013 - Geneva, Switzerland
Duration: 2 Sep 20135 Sep 2013

Publication series

NameProceedings - 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013

Conference

Conference2013 5th Humaine Association Conference on Affective Computing and Intelligent Interaction, ACII 2013
Country/TerritorySwitzerland
CityGeneva
Period2/09/135/09/13

Keywords

  • Deep neural networks
  • Sparse autoencoder
  • Speech emotion recognition
  • Transfer learning

Fingerprint

Dive into the research topics of 'Sparse autoencoder-based feature transfer learning for speech emotion recognition'. Together they form a unique fingerprint.

Cite this