Waveform Generation For Text-To-Speech Synthesis Using Pitch-Synchronous Multi-Scale Generative Adversarial Networks - Citegraph

Paper Info

Title
Waveform Generation For Text-To-Speech Synthesis Using Pitch-Synchronous Multi-Scale Generative Adversarial Networks

Abstract
The state-of-the-art in text-to-speech (TTS) synthesis has recently improved considerably due to novel neural waveform generation methods, such as WaveNet. However, these methods suffer from their slow sequential inference process, while their parallel versions are difficult to train and even more computationally expensive. Meanwhile, generative adversarial networks (GANs) have achieved impressive results in image generation and are making their way into audio applications; parallel inference is among their lucrative properties. By adopting recent advances in GAN training techniques, this investigation studies waveform generation for TTS in two domains (speech signal and glottal excitation). Listening test results show that while direct waveform generation with GAN is still far behind WaveNet, a GAN-based glottal excitation model can achieve quality and voice similarity on par with a WaveNet vocoder.

Year	DOI	Venue
2018	10.1109/icassp.2019.8683271	2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords	Field	DocType
Neural vocoding, text-to-speech, GAN, glottal excitation model	Image generation,Listening test,Inference,Computer science,Waveform,Speech recognition,Text to speech synthesis,Generative grammar,Adversarial system	Journal
Volume	ISSN	Citations
abs/1810.12598	1520-6149	0
PageRank	References	Authors
0.34	0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Lauri Juvela	1	35	8.29
Bajibabu Bollepalli	2	22	7.17
junichi yamagishi	3	1906	145.51
Paavo Alku	4	728	98.07

1