Learning with synthesized speech for automatic emotion recognition

Björn Schuller, Felix Burkhardt

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

29 Scopus citations

Abstract

Data sparseness is an ever dominating problem in automatic emotion recognition. Using artificially generated speech for training or adapting models could potentially ease this: though less natural than human speech, one could synthesize the exact spoken content in different emotional nuances - of many speakers and even in different languages. To investigate chances, the phonemisation components Txt2Pho and openMary are used with Emofilt and Mbrola for emotional speech synthesis. Analysis is realized with our Munich open Emotion and Affect Recognition toolkit. As test set we gently limit to the acted Berlin and eNTERFACE databases for the moment. In the result synthesized speech can indeed be used for the recognition of human emotional speech.

Original languageEnglish
Title of host publication2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5150-5153
Number of pages4
ISBN (Print)9781424442966
DOIs
StatePublished - 2010
Event2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010 - Dallas, TX, United States
Duration: 14 Mar 201019 Mar 2010

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2010 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2010
Country/TerritoryUnited States
CityDallas, TX
Period14/03/1019/03/10

Keywords

  • Affective computing
  • Emotion recognition
  • Speech analysis
  • Speech synthesis

Fingerprint

Dive into the research topics of 'Learning with synthesized speech for automatic emotion recognition'. Together they form a unique fingerprint.

Cite this