Title
Accurate visual speech synthesis based on diviseme unit selection and concatenation
Abstract
This paper presents a novel speech driven accurate realistic visual speech synthesis approach. Firstly, an audio visual instance database is built for different viseme context combinations, i.e. diviseme units, using 100 audio visual speech sentences of a female speaker. Then a diviseme instance selection algorithm is introduced to choose the optimal diviseme instances for the viseme contexts in the input speech, considering both the concatenation smoothness of the image sequences, and matching of the mouth movements to the acoustic pronunciation process, as well the intensity of the input speech. Finally mouth image sequences of corresponding viseme segments in the selected diviseme instances are time warped and blended to construct the mouth images of the final animation. Visual speech synthesis experiments and subjective evaluation results show that mouth animations can be obtained which are not only realistic with clear and smooth mouth images, but also in good accordance with the acoustic pronunciation and intensity of the input speech.
Year
DOI
Venue
2008
10.1109/MMSP.2008.4665203
MMSP
Keywords
Field
DocType
mouth animations,image matching,audio visual instance database,acoustic pronunciation process,mouth movements matching,visual speech synthesis approach,image segmentation,visual communication,speech synthesis,image sequences,diviseme instance selection algorithm,image motion analysis,speech,trajectory,tracking,acoustics,animation,visualization
Pronunciation,Computer science,Viseme,Image segmentation,Artificial intelligence,Concatenation,Computer vision,Speech synthesis,Pattern recognition,Visualization,Speech recognition,Visual communication,Animation
Conference
ISBN
Citations 
PageRank 
978-1-4244-2295-1
2
0.39
References 
Authors
7
4
Name
Order
Citations
PageRank
Jiang Dongmei111515.28
Ravyse Ilse2436.24
Hichem Sahli347565.19
Yanning Zhang41613176.32