ObamaNet: Photo-realistic lip-sync from text. - Citegraph

Paper Info

Title
ObamaNet: Photo-realistic lip-sync from text.

Abstract
We present ObamaNet, the first architecture that generates both audio and synchronized photo-realistic lip-sync videos from any new text. Contrary to other published lip-sync approaches, ours is only composed of fully trainable neural modules and does not rely on any traditional computer graphics methods. More precisely, we use three main modules: a text-to-speech network based on Char2Wav, a time-delayed LSTM to generate mouth-keypoints synced to the audio, and a network based on Pix2Pix to generate the video frames conditioned on the keypoints.

Year	Venue	Field
2018	arXiv: Computer Vision and Pattern Recognition	Computer vision,Architecture,Pattern recognition,Computer science,Artificial intelligence,Computer graphics,Lip sync
DocType	Volume	Citations
Journal	abs/1801.01442	3
PageRank	References	Authors
0.39	2	5

Authors (5 rows)

Cited by (3 rows)

References (2 rows)

Name	Order	Citations	PageRank
Rithesh Kumar	1	8	1.81
Jose Sotelo	2	9	2.16
Kundan Kumar	3	10	5.89
Alexandre de Brébisson	4	5	0.76
Yoshua Bengio	5	42677	3039.83

1