A Mouth Full Of Words: Visually Consistent Acoustic Redubbing - Citegraph

Paper Info

Title
A Mouth Full Of Words: Visually Consistent Acoustic Redubbing

Abstract
This paper introduces a method for automatic redubbing of video that exploits the many-to-many mapping of phoneme sequences to lip movements modelled as dynamic visemes [f]. For a given utterance, the corresponding dynamic viseme sequence is sampled to construct a graph of possible phoneme sequences that synchronize with the video. When composed with a pronunciation dictionary and language model, this produces a vast number of word sequences that are in sync with the original video, literally putting plausible words into the mouth of the speaker. We demonstrate that traditional, many-to-one, static visemes lack flexibility for this application as they produce significantly fewer word sequences. This work explores the natural ambiguity in visual speech and offers insight for automatic speech recognition and the importance of language modeling.

Year	Venue	Keywords
2015	2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP)	Audio-visual speech, dynamic visemes, acoustic redubbing
Field	DocType	ISSN
Speech corpus,Pronunciation,Speech processing,Speech analytics,Viseme,Audio mining,Computer science,Speech recognition,Natural language processing,Artificial intelligence,Language model,Acoustic model	Conference	1520-6149
Citations	PageRank	References
2	0.37	6
Authors
3

Authors (3 rows)

Cited by (2 rows)

References (6 rows)

Name	Order	Citations	PageRank
Sarah L. Taylor	1	67	4.77
Barry-John Theobald	2	332	25.39
Iain Matthews	3	4900	253.61

1