Emotion Recognition from Speech Signals by Mel-Spectrogram and a CNN-RNN

Roneel V. Sharan, Cecilia Mascolo, Björn W. Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Speech emotion recognition (SER) in health applications can offer several benefits by providing insights into the emotional well-being of individuals. In this work, we propose a method for SER using time-frequency representation of the speech signals and neural networks. In particular, we divide the speech signals into overlapping segments and transform each segment into a Mel-spectrogram. The Mel-spectrogram forms the input to YAMNet, a pretrained convolutional neural network for audio classification, which learns spectral characteristics within each Mel-spectrogram. In addition, we utilize a long short-term memory network, a type of recurrent neural network, to learn the temporal dependencies between the sequence of Mel-spectrograms in each speech signal. The proposed method is evaluated on angry, happy, and sad emotion types, and the neutral expression, on two SER datasets, achieving an average accuracy of 0.711 and 0.780, respectively. These results are a relative improvement over baseline methods and demonstrate the potential of our method in detecting emotional states using speech signals.

Original languageEnglish
Title of host publication46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350371499
DOIs
StatePublished - 2024
Externally publishedYes
Event46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2024 - Orlando, United States
Duration: 15 Jul 202419 Jul 2024

Publication series

NameProceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
ISSN (Print)1557-170X

Conference

Conference46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2024
Country/TerritoryUnited States
CityOrlando
Period15/07/2419/07/24

Keywords

  • Convolutional neural network
  • Mel-spectrogram
  • recurrent neural network
  • speech emotion recognition

Fingerprint

Dive into the research topics of 'Emotion Recognition from Speech Signals by Mel-Spectrogram and a CNN-RNN'. Together they form a unique fingerprint.

Cite this