Title
MakeltTalk: speaker-aware talking-head animation
Abstract
AbstractWe present a method that generates expressive talking-head videos from a single facial image with audio as the only input. In contrast to previous attempts to learn direct mappings from audio to raw pixels for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking-head dynamics. Another key component of our method is the prediction of facial landmarks reflecting the speaker-aware dynamics. Based on this intermediate representation, our method works with many portrait images in a single unified framework, including artistic paintings, sketches, 2D cartoon characters, Japanese mangas, and stylized caricatures. In addition, our method generalizes well for faces and characters that were not observed during training. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking-heads of significantly higher quality compared to prior state-of-the-art methods.
Year
DOI
Venue
2020
10.1145/3414685.3417774
ACM Transactions on Graphics
Keywords
DocType
Volume
Facial Animation, Neural Networks
Journal
39
Issue
ISSN
Citations 
6
0730-0301
6
PageRank 
References 
Authors
0.47
23
6
Name
Order
Citations
PageRank
Yang Zhou11026.41
Xintong Han214013.20
Eli Shechtman34340177.94
Jose I. Echevarria4789.52
Evangelos Kalogerakis5137753.82
Dingzeyu Li61127.31