Neural Voice Puppetry: Audio-Driven Facial Reenactment

Justus Thies, Mohamed Elgharib, Ayush Tewari, Christian Theobalt, Matthias Nießner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

224 Scopus citations

Abstract

We present Neural Voice Puppetry, a novel approach for audio-driven facial video synthesis (Video, Code and Demo: https://justusthies.github.io/posts/neural-voice-puppetry/). Given an audio sequence of a source person or digital assistant, we generate a photo-realistic output video of a target person that is in sync with the audio of the source input. This audio-driven facial reenactment is driven by a deep neural network that employs a latent 3D face model space. Through the underlying 3D representation, the model inherently learns temporal stability while we leverage neural rendering to generate photo-realistic output frames. Our approach generalizes across different people, allowing us to synthesize videos of a target actor with the voice of any unknown source actor or even synthetic voices that can be generated utilizing standard text-to-speech approaches. Neural Voice Puppetry has a variety of use-cases, including audio-driven video avatars, video dubbing, and text-driven video synthesis of a talking head. We demonstrate the capabilities of our method in a series of audio- and text-based puppetry examples, including comparisons to state-of-the-art techniques and a user study.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2020 - 16th European Conference, Proceedings
EditorsAndrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm
PublisherSpringer Science and Business Media Deutschland GmbH
Pages716-731
Number of pages16
ISBN (Print)9783030585167
DOIs
StatePublished - 2020
Event16th European Conference on Computer Vision, ECCV 2020 - Glasgow, United Kingdom
Duration: 23 Aug 202028 Aug 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12361 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th European Conference on Computer Vision, ECCV 2020
Country/TerritoryUnited Kingdom
CityGlasgow
Period23/08/2028/08/20

Fingerprint

Dive into the research topics of 'Neural Voice Puppetry: Audio-Driven Facial Reenactment'. Together they form a unique fingerprint.

Cite this