Abstract | ||
---|---|---|
This paper studies the methods for emotional speech synthesis using a neural vocoder. For a neural vocoder, WaveNet is used, which generates waveforms from mel spectrograms. We propose two networks, i.e., deep convolutional neural network (CNN)-based text-to-speech (TTS) system and emotional converter, and deep CNN architecture is designed as to utilize long-term context information. The first network estimates neutral mel spectrograms using linguistic features, and the second network converts neutral mel spectrograms to emotional mel spectrograms. Experimental results on a TTS system and emotional TTS system, showed that the proposed systems are indeed a promising approach. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ICCE.2019.8661919 | 2019 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE) |
Field | DocType | ISSN |
Computer vision,Speech synthesis,Computer science,Convolutional neural network,Spectrogram,Speech recognition,Artificial intelligence | Conference | 2158-3994 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Heejin Choi | 1 | 6 | 1.80 |
sangjun park | 2 | 2 | 2.43 |
Jinuk Park | 3 | 2 | 2.74 |
Minsoo Hahn | 4 | 223 | 46.63 |