Title
CREATING AND CONTROLLING VIDEO-REALISTIC TALKING HEADS
Abstract
We present a linear three-dimensional modeling paradigm for lips and face, that captures the audiovisual speech activity of a given speaker by only six parameters. Our articulatory models are constructed from real data (front and profile images), using a linear component analysis of about 200 3D coordinates of fleshpoints on the subject's face and lips. Compared to a raw component analysis, our construction approach leads to somewhat more comparable relations across subjects: by construction, the six parameters have a clear phonetic/articulatory interpretation. We use such a speaker's specific articulatory model to regularize MPEG-4 facial articulation parameters (FAP) and show that this regularization process can drastically reduce bandwidth, noise and quantization artifacts. We then present how analysis-by-synthesis techniques using the speaker-specific model allows the tracking of facial movements. Finally, the results of this tracking scheme have been used to develop a text-to-audiovisual speech system.
Year
Venue
Keywords
2001
AVSP
analysis by synthesis,three dimensional
Field
DocType
Citations 
Computer science,Speech recognition,Regularization (mathematics),Bandwidth (signal processing),3d coordinates,Component analysis,Quantization (signal processing)
Conference
17
PageRank 
References 
Authors
1.38
3
4
Name
Order
Citations
PageRank
F. Elisei1171.38
Matthias Odisio2998.60
G Bailly3957.83
P. Badin4171.38