Title | ||
---|---|---|
Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra |
Abstract | ||
---|---|---|
•We propose novel training algorithms for vocoder-free text-to-speech synthesis using STFT spectra based on generative adversarial networks (GANs).•We demonstrate that using GANs with the original-frequency-resolution amplitude spectra degrades the synthetic speech quality.•We show that the proposed low-frequency-resolution GANs improves the synthetic speech quality.•We also show that using the inverse mel scale for the proposed algorithm further improves the synthetic speech quality. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1016/j.csl.2019.05.008 | Computer Speech & Language |
Keywords | Field | DocType |
Vocoder-free text-to-speech,Training algorithm,STFT amplitude spectra,Generative adversarial networks,Frequency resolution,Frequency warping | Inverse,Image warping,Hyperparameter,Computer science,Short-time Fourier transform,Fourier transform,Speech recognition,Speech perception,Amplitude,Acoustic model | Journal |
Volume | ISSN | Citations |
58 | 0885-2308 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Saito, Yuki | 1 | 26 | 7.87 |
Shinnosuke Takamichi | 2 | 75 | 22.08 |
Saruwatari, H. | 3 | 652 | 90.81 |