Abstract | ||
---|---|---|
We present a linear three-dimensional modeling paradigm for lips and face, that captures the audiovisual speech activity of a given speaker by only six parameters. Our articulatory models are constructed from real data (front and profile images), using a linear component analysis of about 200 3D coordinates of fleshpoints on the subject's face and lips. Compared to a raw component analysis, our construction approach leads to somewhat more comparable relations across subjects: by construction, the six parameters have a clear phonetic/articulatory interpretation. We use such a speaker's specific articulatory model to regularize MPEG-4 facial articulation parameters (FAP) and show that this regularization process can drastically reduce bandwidth, noise and quantization artifacts. We then present how analysis-by-synthesis techniques using the speaker-specific model allows the tracking of facial movements. Finally, the results of this tracking scheme have been used to develop a text-to-audiovisual speech system. |
Year | Venue | Keywords |
---|---|---|
2001 | AVSP | analysis by synthesis,three dimensional |
Field | DocType | Citations |
Computer science,Speech recognition,Regularization (mathematics),Bandwidth (signal processing),3d coordinates,Component analysis,Quantization (signal processing) | Conference | 17 |
PageRank | References | Authors |
1.38 | 3 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
F. Elisei | 1 | 17 | 1.38 |
Matthias Odisio | 2 | 99 | 8.60 |
G Bailly | 3 | 95 | 7.83 |
P. Badin | 4 | 17 | 1.38 |