GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram. - Citegraph

Paper Info

Title
GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram.

Abstract
Recent advances in neural network -based text-to-speech have reached human level naturalness in synthetic speech. The present sequence-to-sequence models can directly map text to mel-spectrogram acoustic features, which are convenient for modeling, but present additional challenges for vocoding (i.e., waveform generation from the acoustic features). High-quality synthesis can be achieved with neural vocoders, such as WaveNet, but such autoregressive models suffer from slow sequential inference. Meanwhile, their existing parallel inference counterparts are difficult to train and require increasingly large model sizes. In this paper, we propose an alternative training strategy for a parallel neural vocoder utilizing generative adversarial networks, and integrate a linear predictive synthesis filter into the model. Results show that the proposed model achieves significant improvement in inference speed, while outperforming a WaveNet in copy-synthesis quality.

Year	DOI	Venue
2019	10.21437/interspeech.2019-2008	Conference of the International Speech Communication Association
DocType	Volume	Citations
Journal	abs/1904.03976	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Lauri Juvela	1	35	8.29
Bajibabu Bollepalli	2	22	7.17
junichi yamagishi	3	1906	145.51
Paavo Alku	4	728	98.07

1