CAPTURING DATA AND REALISTIC 3D MODELS FOR CUED SPEECH ANALYSIS AND AUDIOVISUAL SYNTHESIS - Citegraph

Paper Info

Title
CAPTURING DATA AND REALISTIC 3D MODELS FOR CUED SPEECH ANALYSIS AND AUDIOVISUAL SYNTHESIS

Abstract
We have implemented a complete text-to-speech synthesis system by concatenation that addresses French Manual Cued Speech (FMCS). It uses two separate dictionaries, one for multimodal diphones with audio and facial articulation, and the other with the gestures between two consecutive FMCS keys ("dikeys"). Dictionaries were built from real data. This paper presents our methodology and the final results, illustrated by the accompanying videos. We recorded and analyzed the 3D trajectories of 50 hand and 63 facial fleshpoints during the production of 238 utterances carefully designed to cover all possible diphones of French. Linear and non-linear statistical models of hand and face deformations and postures were developed using both separate and joint corpora. Additional data allowed us to capture the shape of the hand and face with a higher spatial density (2,600 points for the hand and forearm and 2,000 for the face), as well as their appearance. We succeeded in building new high-density articulated models that were compatible with the previous emerging set of control parameters. This allows the outputted synthesis parameters to drive the more realistic 3D models instead of the low-density ones.

Year	Venue	Keywords
2005	AVSP	cued speech,statistical model
Field	DocType	Citations
Spatial density,Gesture,Computer science,Cued speech,Speech recognition,Statistical model,Artificial intelligence,Concatenation,Natural language processing	Conference	1
PageRank	References	Authors
0.37	13	4

Authors (4 rows)

Cited by (1 rows)

References (13 rows)

Name	Order	Citations	PageRank
Frédéric Elisei	1	275	25.05
Gérard Bailly	2	609	99.37
Guillaume Gibert	3	84	10.48
Rémi Brun	4	7	1.86

1