Title
Emotional Speech Synthesis For Multi-Speaker Emotional Dataset Using Wavenet Vocoder
Abstract
This paper studies the methods for emotional speech synthesis using a neural vocoder. For a neural vocoder, WaveNet is used, which generates waveforms from mel spectrograms. We propose two networks, i.e., deep convolutional neural network (CNN)-based text-to-speech (TTS) system and emotional converter, and deep CNN architecture is designed as to utilize long-term context information. The first network estimates neutral mel spectrograms using linguistic features, and the second network converts neutral mel spectrograms to emotional mel spectrograms. Experimental results on a TTS system and emotional TTS system, showed that the proposed systems are indeed a promising approach.
Year
DOI
Venue
2019
10.1109/ICCE.2019.8661919
2019 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE)
Field
DocType
ISSN
Computer vision,Speech synthesis,Computer science,Convolutional neural network,Spectrogram,Speech recognition,Artificial intelligence
Conference
2158-3994
Citations 
PageRank 
References 
0
0.34
0
Authors
4
Name
Order
Citations
PageRank
Heejin Choi161.80
sangjun park222.43
Jinuk Park322.74
Minsoo Hahn422346.63