Title
Waveform Generation For Text-To-Speech Synthesis Using Pitch-Synchronous Multi-Scale Generative Adversarial Networks
Abstract
The state-of-the-art in text-to-speech (TTS) synthesis has recently improved considerably due to novel neural waveform generation methods, such as WaveNet. However, these methods suffer from their slow sequential inference process, while their parallel versions are difficult to train and even more computationally expensive. Meanwhile, generative adversarial networks (GANs) have achieved impressive results in image generation and are making their way into audio applications; parallel inference is among their lucrative properties. By adopting recent advances in GAN training techniques, this investigation studies waveform generation for TTS in two domains (speech signal and glottal excitation). Listening test results show that while direct waveform generation with GAN is still far behind WaveNet, a GAN-based glottal excitation model can achieve quality and voice similarity on par with a WaveNet vocoder.
Year
DOI
Venue
2018
10.1109/icassp.2019.8683271
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords
Field
DocType
Neural vocoding, text-to-speech, GAN, glottal excitation model
Image generation,Listening test,Inference,Computer science,Waveform,Speech recognition,Text to speech synthesis,Generative grammar,Adversarial system
Journal
Volume
ISSN
Citations 
abs/1810.12598
1520-6149
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Lauri Juvela1358.29
Bajibabu Bollepalli2227.17
junichi yamagishi31906145.51
Paavo Alku472898.07