Title
Expressive Talking Head Generation with Granular Audio-Visual Control
Abstract
Generating expressive talking heads is essential for creating virtual humans. However, existing one- or few-shot methods focus on lip-sync and head motion, ignoring the emotional expressions that make talking faces realistic. In this paper, we propose the Granularly Controlled Audio-Visual Talking Heads (GC-AVT), which controls lip movements, head poses, and facial expressions of a talking head in a granular manner. Our insight is to decouple the audio-visual driving sources through prior-based pre-processing designs. Detailedly, we disassemble the driving image into three complementary parts including: 1) a cropped mouth that facilitates lip-sync; 2) a masked head that implicitly learns pose; and 3) the upper face which works corporately and complementarily with a time-shifted mouth to contribute the expression. Interestingly, the encoded features from the three sources are integrally balanced through reconstruction training. Extensive experiments show that our method generates expressive faces with not only synced mouth shapes, controllable poses, but precisely animated emotional expressions as well.
Year
DOI
Venue
2022
10.1109/CVPR52688.2022.00338
IEEE Conference on Computer Vision and Pattern Recognition
Keywords
DocType
Volume
Image and video synthesis and generation
Conference
2022
Issue
Citations 
PageRank 
1
0
0.34
References 
Authors
0
10
Name
Order
Citations
PageRank
Borong Liang100.34
Yan Pan200.34
Zhizhi Guo300.68
Hang Zhou400.68
Zhibin Hong5374.94
Xiaoguang Han622029.01
Junyu Han78511.12
jingtuo liu8479.43
Er-rui Ding914229.31
Jingdong Wang1000.34