Title
Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra
Abstract
•We propose novel training algorithms for vocoder-free text-to-speech synthesis using STFT spectra based on generative adversarial networks (GANs).•We demonstrate that using GANs with the original-frequency-resolution amplitude spectra degrades the synthetic speech quality.•We show that the proposed low-frequency-resolution GANs improves the synthetic speech quality.•We also show that using the inverse mel scale for the proposed algorithm further improves the synthetic speech quality.
Year
DOI
Venue
2019
10.1016/j.csl.2019.05.008
Computer Speech & Language
Keywords
Field
DocType
Vocoder-free text-to-speech,Training algorithm,STFT amplitude spectra,Generative adversarial networks,Frequency resolution,Frequency warping
Inverse,Image warping,Hyperparameter,Computer science,Short-time Fourier transform,Fourier transform,Speech recognition,Speech perception,Amplitude,Acoustic model
Journal
Volume
ISSN
Citations 
58
0885-2308
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Saito, Yuki1267.87
Shinnosuke Takamichi27522.08
Saruwatari, H.365290.81