Title
CAPTURING DATA AND REALISTIC 3D MODELS FOR CUED SPEECH ANALYSIS AND AUDIOVISUAL SYNTHESIS
Abstract
We have implemented a complete text-to-speech synthesis system by concatenation that addresses French Manual Cued Speech (FMCS). It uses two separate dictionaries, one for multimodal diphones with audio and facial articulation, and the other with the gestures between two consecutive FMCS keys ("dikeys"). Dictionaries were built from real data. This paper presents our methodology and the final results, illustrated by the accompanying videos. We recorded and analyzed the 3D trajectories of 50 hand and 63 facial fleshpoints during the production of 238 utterances carefully designed to cover all possible diphones of French. Linear and non-linear statistical models of hand and face deformations and postures were developed using both separate and joint corpora. Additional data allowed us to capture the shape of the hand and face with a higher spatial density (2,600 points for the hand and forearm and 2,000 for the face), as well as their appearance. We succeeded in building new high-density articulated models that were compatible with the previous emerging set of control parameters. This allows the outputted synthesis parameters to drive the more realistic 3D models instead of the low-density ones.
Year
Venue
Keywords
2005
AVSP
cued speech,statistical model
Field
DocType
Citations 
Spatial density,Gesture,Computer science,Cued speech,Speech recognition,Statistical model,Artificial intelligence,Concatenation,Natural language processing
Conference
1
PageRank 
References 
Authors
0.37
13
4
Name
Order
Citations
PageRank
Frédéric Elisei127525.05
Gérard Bailly260999.37
Guillaume Gibert38410.48
Rémi Brun471.86