Speech Driven Talking Face Generation From a Single Image and an Emotion Condition. - Citegraph

Paper Info

Title
Speech Driven Talking Face Generation From a Single Image and an Emotion Condition.

Abstract
Visual emotion expression plays an important role in audiovisual speech communication. In this work, we propose a novel approach to rendering visual emotion expression in speech-driven talking face generation. Specifically, we design an end-to-end talking face generation system that takes a speech utterance, a single face image, and a categorical emotion label as input to render a talking face video in sync with the speech and expressing the condition emotion. Objective evaluation on image quality, audiovisual synchronization, and visual emotion expression shows that the proposed system outperforms a state-of-the-art baseline system. Subjective evaluation of visual emotion expression and video realness also demonstrates the superiority of the proposed system. Furthermore, we conduct a pilot study on human emotion recognition of generated videos with mismatched emotions between the audio and visual modalities, and results show that humans reply on the visual modality more significantly than the audio modality on this task.

Year	DOI	Venue
2022	10.1109/TMM.2021.3099900	IEEE Transactions on Multimedia
DocType	Volume	ISSN
Journal	24	1520-9210
Citations	PageRank	References
0	0.34	0
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Eskimez, S.E.	1	15	5.34
You Zhang	2	1	2.71
Zhiyao Duan	3	305	26.86

1