Title
An Hmm/Dnn Comparison For Synchronized Text-To-Speech And Tongue Motion Synthesis
Abstract
We present an end-to-end text-to-speech (TTS) synthesis system that generates audio and synchronized tongue motion directly from text. This is achieved by adapting a statistical shape space model of the tongue surface to an articulatory speech corpus and training a speech synthesis system directly on the tongue model parameter weights. We focus our analysis on the application of two standard methodologies, based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs), respectively, to train both acoustic models and the tongue model parameter weights. We evaluate both methodologies at every step by comparing the predicted articulatory movements against the reference data. The results show that even with less than 2h of data, DNNs already outperform HMMs.
Year
DOI
Venue
2017
10.21437/Interspeech.2017-936
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
Keywords
Field
DocType
Text-to-speech, multimodal synthesis, tongue modeling, articulatory animation
Speech synthesis,Pattern recognition,Computer science,Speech recognition,Artificial intelligence,Hidden Markov model,Motion synthesis,Tongue
Conference
ISSN
Citations 
PageRank 
2308-457X
0
0.34
References 
Authors
8
3
Name
Order
Citations
PageRank
Sébastien Le Maguer1294.48
Ingmar Steiner26712.25
Alexander Hewer351.78