Towards a standard set of acoustic features for the processing of emotion in speech

Florian Eyben, Anton Batliner, Bjoern Schuller

Publikation: Beitrag in FachzeitschriftKonferenzartikelBegutachtung

17 Zitate (Scopus)

Abstract

Researchers concerned with the automatic recognition of human emotion in speech have proposed a considerable variety of segmental and supra-segmental acoustic descriptors. These range from prosodic characteristics and voice quality to acoustic correlates of articulation, and represent unequal degrees of perceptual elaboration. Recently, evidence has been reported from first comparisons on multiple speech databases that spectral and cepstral characteristics might have the greatest potential for the task. Yet, novel acoustic correlates are constantly proposed, as the question of the optimal representation remains disputed. The task of evaluating suggested correlates is non-trivial, as no agreed "standard" set and method of assessment exists, and inter-corpus substantiation is usually lacking. Such substantiation is particularly difficult owing to the divergence of models employed for the ground-truth description of emotion. To ease this challenge, using the arousal-valence space as the predominant means for mapping information stemming from diverse speech resources, including acted and spontaneous speech with variable and fixed phonetic content on well-defined binary tasks is proposed. The acoustic baseline feature sets of all six past emotion and paralinguistics challenges are evaluted systematically on eight standard speech-emotion corpora in order to asses the power of each feature set for different types of data.

OriginalspracheEnglisch
Aufsatznummer060006
FachzeitschriftProceedings of Meetings on Acoustics
Jahrgang9
DOIs
PublikationsstatusVeröffentlicht - 2010
Veranstaltung159th Meeting Acoustical Society of America/NOISE-CON 2010 - Baltimore, MD, USA/Vereinigte Staaten
Dauer: 19 Apr. 201023 Apr. 2010

Fingerprint

Untersuchen Sie die Forschungsthemen von „Towards a standard set of acoustic features for the processing of emotion in speech“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren