An Hmm/Dnn Comparison For Synchronized Text-To-Speech And Tongue Motion Synthesis - Citegraph

Paper Info

Title
An Hmm/Dnn Comparison For Synchronized Text-To-Speech And Tongue Motion Synthesis

Abstract
We present an end-to-end text-to-speech (TTS) synthesis system that generates audio and synchronized tongue motion directly from text. This is achieved by adapting a statistical shape space model of the tongue surface to an articulatory speech corpus and training a speech synthesis system directly on the tongue model parameter weights. We focus our analysis on the application of two standard methodologies, based on Hidden Markov Models (HMMs) and Deep Neural Networks (DNNs), respectively, to train both acoustic models and the tongue model parameter weights. We evaluate both methodologies at every step by comparing the predicted articulatory movements against the reference data. The results show that even with less than 2h of data, DNNs already outperform HMMs.

Year	DOI	Venue
2017	10.21437/Interspeech.2017-936	18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
Keywords	Field	DocType
Text-to-speech, multimodal synthesis, tongue modeling, articulatory animation	Speech synthesis,Pattern recognition,Computer science,Speech recognition,Artificial intelligence,Hidden Markov model,Motion synthesis,Tongue	Conference
ISSN	Citations	PageRank
2308-457X	0	0.34
References	Authors
8	3

Authors (3 rows)

Cited by (0 rows)

References (8 rows)

Name	Order	Citations	PageRank
Sébastien Le Maguer	1	29	4.48
Ingmar Steiner	2	67	12.25
Alexander Hewer	3	5	1.78

1