Title | ||
---|---|---|
CAPTURING DATA AND REALISTIC 3D MODELS FOR CUED SPEECH ANALYSIS AND AUDIOVISUAL SYNTHESIS |
Abstract | ||
---|---|---|
We have implemented a complete text-to-speech synthesis system by concatenation that addresses French Manual Cued Speech (FMCS). It uses two separate dictionaries, one for multimodal diphones with audio and facial articulation, and the other with the gestures between two consecutive FMCS keys ("dikeys"). Dictionaries were built from real data. This paper presents our methodology and the final results, illustrated by the accompanying videos. We recorded and analyzed the 3D trajectories of 50 hand and 63 facial fleshpoints during the production of 238 utterances carefully designed to cover all possible diphones of French. Linear and non-linear statistical models of hand and face deformations and postures were developed using both separate and joint corpora. Additional data allowed us to capture the shape of the hand and face with a higher spatial density (2,600 points for the hand and forearm and 2,000 for the face), as well as their appearance. We succeeded in building new high-density articulated models that were compatible with the previous emerging set of control parameters. This allows the outputted synthesis parameters to drive the more realistic 3D models instead of the low-density ones. |
Year | Venue | Keywords |
---|---|---|
2005 | AVSP | cued speech,statistical model |
Field | DocType | Citations |
Spatial density,Gesture,Computer science,Cued speech,Speech recognition,Statistical model,Artificial intelligence,Concatenation,Natural language processing | Conference | 1 |
PageRank | References | Authors |
0.37 | 13 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Frédéric Elisei | 1 | 275 | 25.05 |
Gérard Bailly | 2 | 609 | 99.37 |
Guillaume Gibert | 3 | 84 | 10.48 |
Rémi Brun | 4 | 7 | 1.86 |