Title
Synthesizing Talking Face from Text and Audio via Autoencoder and Sequence-to-Sequence Convolutional Neural Networks
Abstract
•An effective landmark localization pipeline based on landmark detection, optical flow estimation, and Kalman filter, is proposed to avoid face shake.•Part-based autoencoder is introduced to learn low-dimensional representation on different face regions.•A sequence-to-sequence convolutional neural network with residual units is proposed to learn the mapping from phoneme to facial codes.•The method is tested two public audio-visual datasets and a new dataset called Chinese CCTV News demonstrate the effectiveness of the proposed method against other state-of-the-art methods.
Year
DOI
Venue
2020
10.1016/j.patcog.2020.107231
Pattern Recognition
Keywords
DocType
Volume
Convolutional neural network,Autoencoder,Regression,Face landmark,Face tracking,Lip sync,Video,Audio
Journal
102
Issue
ISSN
Citations 
C
0031-3203
1
PageRank 
References 
Authors
0.35
0
5
Name
Order
Citations
PageRank
Na Liu1183.06
Tao Zhou220921.34
Yunfeng Ji385.19
Ziyi Zhao410.35
Lihong Wan5123.54