Expressive Talking Head Generation with Granular Audio-Visual Control - Citegraph

Paper Info

Title
Expressive Talking Head Generation with Granular Audio-Visual Control

Abstract
Generating expressive talking heads is essential for creating virtual humans. However, existing one- or few-shot methods focus on lip-sync and head motion, ignoring the emotional expressions that make talking faces realistic. In this paper, we propose the Granularly Controlled Audio-Visual Talking Heads (GC-AVT), which controls lip movements, head poses, and facial expressions of a talking head in a granular manner. Our insight is to decouple the audio-visual driving sources through prior-based pre-processing designs. Detailedly, we disassemble the driving image into three complementary parts including: 1) a cropped mouth that facilitates lip-sync; 2) a masked head that implicitly learns pose; and 3) the upper face which works corporately and complementarily with a time-shifted mouth to contribute the expression. Interestingly, the encoded features from the three sources are integrally balanced through reconstruction training. Extensive experiments show that our method generates expressive faces with not only synced mouth shapes, controllable poses, but precisely animated emotional expressions as well.

Year	DOI	Venue
2022	10.1109/CVPR52688.2022.00338	IEEE Conference on Computer Vision and Pattern Recognition
Keywords	DocType	Volume
Image and video synthesis and generation	Conference	2022
Issue	Citations	PageRank
1	0	0.34
References	Authors
0	10

Authors (10 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Borong Liang	1	0	0.34
Yan Pan	2	0	0.34
Zhizhi Guo	3	0	0.68
Hang Zhou	4	0	0.68
Zhibin Hong	5	37	4.94
Xiaoguang Han	6	220	29.01
Junyu Han	7	85	11.12
jingtuo liu	8	47	9.43
Er-rui Ding	9	142	29.31
Jingdong Wang	10	0	0.34

1