Picture My Voice: Audio to Visual Speech Synthesis using Artificial Neural Networks - Citegraph

Paper Info

Title
Picture My Voice: Audio to Visual Speech Synthesis using Artificial Neural Networks

Abstract
This paper presents an initial implementation and evaluation of a system that synthesizes visual speech directly from the acoustic waveform. An artifical neural network (ANN) was trained to map the cepstral coefficients of an individual's natural speech to the control parameters of an animated synthetic talking head. We trained on two data sets; one was a set of 400 words spoken in isolation by a single speaker and the other a subset of extemporaneous speech from 10 different speakers. The system showed learning in both cases. A perceptual evaluation test indicated that the system's generalization to new words by the same speaker provides significant visible information, but significantly below that given by a text-to-speech algorithm.

Year	Venue	Keywords
1999	AVSP	speech synthesis,neural network,artificial neural network,text to speech
Field	DocType	Citations
Motion capture,Speech synthesis,Gesture,Computer science,Communication channel,Speech recognition,Coarticulation,Animation,Artificial neural network,Perception	Conference	40
PageRank	References	Authors
1.90	8	5

Authors (5 rows)

Cited by (40 rows)

References (8 rows)

Name	Order	Citations	PageRank
Dominic W. Massaro	1	391	49.07
Jonas Beskow	2	668	96.64
Michael M. Cohen	3	268	34.79
Christopher L. Fry	4	40	2.23
Tony Rodriguez	5	41	2.29

1