Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra - Citegraph

Paper Info

Title
Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra

Abstract
•We propose novel training algorithms for vocoder-free text-to-speech synthesis using STFT spectra based on generative adversarial networks (GANs).•We demonstrate that using GANs with the original-frequency-resolution amplitude spectra degrades the synthetic speech quality.•We show that the proposed low-frequency-resolution GANs improves the synthetic speech quality.•We also show that using the inverse mel scale for the proposed algorithm further improves the synthetic speech quality.

Year	DOI	Venue
2019	10.1016/j.csl.2019.05.008	Computer Speech & Language
Keywords	Field	DocType
Vocoder-free text-to-speech,Training algorithm,STFT amplitude spectra,Generative adversarial networks,Frequency resolution,Frequency warping	Inverse,Image warping,Hyperparameter,Computer science,Short-time Fourier transform,Fourier transform,Speech recognition,Speech perception,Amplitude,Acoustic model	Journal
Volume	ISSN	Citations
58	0885-2308	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Saito, Yuki	1	26	7.87
Shinnosuke Takamichi	2	75	22.08
Saruwatari, H.	3	652	90.81

1