VENTRILOQUIST-NET: LEVERAGING SPEECH CUES FOR EMOTIVE TALKING HEAD GENERATION

Deepan Das, Qadeer Khan, Daniel Cremers

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we propose Ventriloquist-Net: A Talking Head Generation model that uses only a speech segment and a single source face image. It places emphasis on emotive expressions. Cues for generating these expressions are implicitly inferred from the speech clip only. We formulate our framework to comprise of independently trained modules to expedite convergence. This not only allows extension to datasets in a semi-supervised manner but also facilitates handling in-the-wild source images. Quantitative and qualitative evaluations on generated videos demonstrate state-of-the-art performance even on unseen input data. Implementation and supplementary videos are available at https://github.com/dipnds/VentriloquistNet.

Original languageEnglish
Title of host publication2022 IEEE International Conference on Image Processing, ICIP 2022 - Proceedings
PublisherIEEE Computer Society
Pages1716-1720
Number of pages5
ISBN (Electronic)9781665496209
DOIs
StatePublished - 2022
Event29th IEEE International Conference on Image Processing, ICIP 2022 - Bordeaux, France
Duration: 16 Oct 202219 Oct 2022

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880

Conference

Conference29th IEEE International Conference on Image Processing, ICIP 2022
Country/TerritoryFrance
CityBordeaux
Period16/10/2219/10/22

Keywords

  • Speech Emotion
  • Talking Head Generation

Fingerprint

Dive into the research topics of 'VENTRILOQUIST-NET: LEVERAGING SPEECH CUES FOR EMOTIVE TALKING HEAD GENERATION'. Together they form a unique fingerprint.

Cite this