Abstract | ||
---|---|---|
Articulatory information has been shown to be effective in improving the performance of hidden Markov model (HMM)-based text-to-speech (TTS) synthesis. Recently. deep learning based TTS has outperformed HMM-based approaches. However. articulatory information has rarely been integrated in deep learning -based TTS. This paper investigated the effectiveness of integrating articulatory movement data to deep learning -based TTS. The integration of articulatory information was achieved in two ways: (1) direct integration, where articulatory and acoustic features were the output of a deep neural network (DNN), and (2) direct integration plus forward-mapping, where the output articulatory features were mapped to acoustic features by an additional DNN: These forward-mapped acoustic features were then combined with the output acoustic features to produce the final acoustic features. Articulatory (tongue and lip) and acoustic data collected from male and female speakers were used in the experiment. Both objective measures and subjective judgment by human listeners showed the approaches integrated articulatory information outperformed the baseline approach (without using articulatory information) in terms of naturalness and speaker voice identity (voice similarity). |
Year | DOI | Venue |
---|---|---|
2017 | 10.21437/Interspeech.2017-1762 | 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION |
Keywords | Field | DocType |
text-to-speech synthesis, articulatory data, deep learning, deep neural network | Computer science,Speech recognition,Text to speech synthesis,Artificial intelligence,Deep learning | Conference |
ISSN | Citations | PageRank |
2308-457X | 2 | 0.38 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Beiming Cao | 1 | 8 | 1.51 |
Myung Jong Kim | 2 | 31 | 6.30 |
Jan P. H. van Santen | 3 | 514 | 99.66 |
ted mau | 4 | 11 | 2.29 |
Jun Wang | 5 | 144 | 15.26 |