Title | ||
---|---|---|
Synthesizing Talking Face from Text and Audio via Autoencoder and Sequence-to-Sequence Convolutional Neural Networks |
Abstract | ||
---|---|---|
•An effective landmark localization pipeline based on landmark detection, optical flow estimation, and Kalman filter, is proposed to avoid face shake.•Part-based autoencoder is introduced to learn low-dimensional representation on different face regions.•A sequence-to-sequence convolutional neural network with residual units is proposed to learn the mapping from phoneme to facial codes.•The method is tested two public audio-visual datasets and a new dataset called Chinese CCTV News demonstrate the effectiveness of the proposed method against other state-of-the-art methods. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1016/j.patcog.2020.107231 | Pattern Recognition |
Keywords | DocType | Volume |
Convolutional neural network,Autoencoder,Regression,Face landmark,Face tracking,Lip sync,Video,Audio | Journal | 102 |
Issue | ISSN | Citations |
C | 0031-3203 | 1 |
PageRank | References | Authors |
0.35 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Na Liu | 1 | 18 | 3.06 |
Tao Zhou | 2 | 209 | 21.34 |
Yunfeng Ji | 3 | 8 | 5.19 |
Ziyi Zhao | 4 | 1 | 0.35 |
Lihong Wan | 5 | 12 | 3.54 |