Title | ||
---|---|---|
Realistic mouth animation based on an articulatory DBN model with constrained asynchrony |
Abstract | ||
---|---|---|
In this paper, we propose an approach to convert acoustic speech to video realistic mouth animation based on an articulatory dynamic Bayesian network model with constrained asynchrony (AF_AVDBN). Conditional probability distributions are defined to control the asynchronies between the articulators such as lips, tongue and glottis/velum. An EM-based conversion algorithm is also presented to learn the optimal visual features given an auditory input and the trained AF_AVDBN parameters. In the training of the AF_AVDBN models, downsampled YUV spatial frequency features of the interpolated mouth image sequences are extracted as visual features. For reproducing the mouth animation sequence, from the learned visual features, a spatial upsampling and a temporal downsampling are applied. Both qualitative and quantitative results show that the proposed method is capable of producing more natural and realistic mouth animations, and the accuracy is further improved compared to the state of the art multi-stream Hidden Markov Model (MSHMM) and articulatory DBN model without asynchrony constraint (AF_DBN). |
Year | DOI | Venue |
---|---|---|
2010 | 10.1109/ICASSP.2010.5494894 | ICASSP |
Keywords | Field | DocType |
constrained asynchrony,statistical distributions,belief networks,speech processing,em-based conversion algorithm,optimal visual features,multistream hidden markov model,interpolation,hearing,computer animation,asynchrony,articulatory dynamic bayesian network model,af-dbn,interpolated mouth image sequences,af_avdbn,af-avdbn,feature extraction,acoustic speech,image sequences,constraint handling,spatial upsampling,video realistic mouth animation,probability distributions,af_dbn,temporal downsampling,mouth animation,conditional probability distribution,auditory input,hidden markov models,downsampled yuv spatial frequency,articulatory dbn model,dynamic bayesian network,visualization,animation,conditional probability,speech,probability distribution,bayesian methods,hidden markov model,spatial frequency,frequency | Speech processing,Pattern recognition,Visualization,Computer science,Speech recognition,Feature extraction,Artificial intelligence,Animation,Computer animation,Upsampling,Hidden Markov model,Dynamic Bayesian network | Conference |
ISSN | ISBN | Citations |
1520-6149 E-ISBN : 978-1-4244-4296-6 | 978-1-4244-4296-6 | 2 |
PageRank | References | Authors |
0.38 | 5 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jiang Dongmei | 1 | 115 | 15.28 |
Ravyse Ilse | 2 | 43 | 6.24 |
Peizhen Liu | 3 | 3 | 0.73 |
Hichem Sahli | 4 | 475 | 65.19 |
Werner Verhelst | 5 | 431 | 51.55 |