Synthesizing Talking Face from Text and Audio via Autoencoder and Sequence-to-Sequence Convolutional Neural Networks - Citegraph

Paper Info

Title
Synthesizing Talking Face from Text and Audio via Autoencoder and Sequence-to-Sequence Convolutional Neural Networks

Abstract
•An effective landmark localization pipeline based on landmark detection, optical flow estimation, and Kalman filter, is proposed to avoid face shake.•Part-based autoencoder is introduced to learn low-dimensional representation on different face regions.•A sequence-to-sequence convolutional neural network with residual units is proposed to learn the mapping from phoneme to facial codes.•The method is tested two public audio-visual datasets and a new dataset called Chinese CCTV News demonstrate the effectiveness of the proposed method against other state-of-the-art methods.

Year	DOI	Venue
2020	10.1016/j.patcog.2020.107231	Pattern Recognition
Keywords	DocType	Volume
Convolutional neural network,Autoencoder,Regression,Face landmark,Face tracking,Lip sync,Video,Audio	Journal	102
Issue	ISSN	Citations
C	0031-3203	1
PageRank	References	Authors
0.35	0	5

Authors (5 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Na Liu	1	18	3.06
Tao Zhou	2	209	21.34
Yunfeng Ji	3	8	5.19
Ziyi Zhao	4	1	0.35
Lihong Wan	5	12	3.54

1