From Speech to Facial Activity: Towards Cross-modal Sequence-to-Sequence Attention Networks

Lukas Stappen, Vincent Karas, Nicholas Cummins, Fabien Ringeval, Klaus Scherer, Björn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Multimodal data sources offer the possibility to capture and model interactions between modalities, leading to an improved understanding of underlying relationships. In this regard, the work presented in this paper explores the relationship between facial muscle movements and speech signals. Specifically, we explore the efficacy of different sequence-to-sequence neural network architectures for the task of predicting Facial Action Coding System Action Units (AUs) from one of two acoustic feature representations extracted from speech signals, namely the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPs) or the Interspeech Computational Paralinguistics Challenge features set (ComParE). Furthermore, these architectures were enhanced by two different attention mechanisms (intra- and inter-attention) and various state-of-the-art network settings to improve prediction performance. Results indicate that a sequence-to-sequence model with inter-attention can achieve on average an Unweighted Average Recall (UAR) of 65.9 % for AU onset, 67.8 % for AU apex (both eGeMAPs), 79.7 % for AU offset and 65.3 % for AU occurrence (both ComParE) detection over all AUs.

Original languageEnglish
Title of host publicationIEEE 21st International Workshop on Multimedia Signal Processing, MMSP 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728118178
DOIs
StatePublished - Sep 2019
Externally publishedYes
Event21st IEEE International Workshop on Multimedia Signal Processing, MMSP 2019 - Kuala Lumpur, Malaysia
Duration: 27 Sep 201929 Sep 2019

Publication series

NameIEEE 21st International Workshop on Multimedia Signal Processing, MMSP 2019

Conference

Conference21st IEEE International Workshop on Multimedia Signal Processing, MMSP 2019
Country/TerritoryMalaysia
CityKuala Lumpur
Period27/09/1929/09/19

Keywords

  • attention networks
  • facial action units
  • paralingustics
  • sequence to sequence

Fingerprint

Dive into the research topics of 'From Speech to Facial Activity: Towards Cross-modal Sequence-to-Sequence Attention Networks'. Together they form a unique fingerprint.

Cite this