Title
Subband Wavenet With Overlapped Single-Sideband Filterbanks
Abstract
Compared with conventional vocoders, deep neural network-based raw audio generative models, such as WaveNet and SampleRNN, can more naturally synthesize speech signals, although the synthesis speed is a problem, especially with high sampling frequency. This paper provides subband WaveNet based on multirate signal processing for high-speed and high-quality synthesis with raw audio generative models. In the training stage, speech waveforms are decomposed and decimated into subband short waveforms with a low sampling rate, and each subband WaveNet network is trained using each subband stream. In the synthesis stage, each generated signal is upsampled and integrated into a fullband speech signal. The results of objective and subjective experiments for unconditional WaveNet with a sampling frequency of 32 kHz indicate that the proposed subband WaveNet with a square-root Hann window-based overlapped 9-channel single-sideband filterbank can realize about four times the synthesis speed and improve the synthesized speech quality more than the conventional fullband WaveNet.
Year
DOI
Venue
2017
10.1109/asru.2017.8269005
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU)
Keywords
Field
DocType
Speech synthesis, WaveNet, subband processing, multirate signal processing, single-sideband filterbank
Signal processing,Speech synthesis,Computer science,Filter bank,Sampling (signal processing),Hann function,Raw audio format,Speech recognition,Artificial neural network,Compatible sideband transmission
Conference
Citations 
PageRank 
References 
0
0.34
0
Authors
5
Name
Order
Citations
PageRank
Takuma Okamoto143.46
Kentaro Tachibana201.01
Tomoki Toda31874167.18
Yoshinori Shiga44513.35
Hisashi Kawai525054.04