Abstract | ||
---|---|---|
Compared with conventional vocoders, deep neural network-based raw audio generative models, such as WaveNet and SampleRNN, can more naturally synthesize speech signals, although the synthesis speed is a problem, especially with high sampling frequency. This paper provides subband WaveNet based on multirate signal processing for high-speed and high-quality synthesis with raw audio generative models. In the training stage, speech waveforms are decomposed and decimated into subband short waveforms with a low sampling rate, and each subband WaveNet network is trained using each subband stream. In the synthesis stage, each generated signal is upsampled and integrated into a fullband speech signal. The results of objective and subjective experiments for unconditional WaveNet with a sampling frequency of 32 kHz indicate that the proposed subband WaveNet with a square-root Hann window-based overlapped 9-channel single-sideband filterbank can realize about four times the synthesis speed and improve the synthesized speech quality more than the conventional fullband WaveNet. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/asru.2017.8269005 | 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) |
Keywords | Field | DocType |
Speech synthesis, WaveNet, subband processing, multirate signal processing, single-sideband filterbank | Signal processing,Speech synthesis,Computer science,Filter bank,Sampling (signal processing),Hann function,Raw audio format,Speech recognition,Artificial neural network,Compatible sideband transmission | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Takuma Okamoto | 1 | 4 | 3.46 |
Kentaro Tachibana | 2 | 0 | 1.01 |
Tomoki Toda | 3 | 1874 | 167.18 |
Yoshinori Shiga | 4 | 45 | 13.35 |
Hisashi Kawai | 5 | 250 | 54.04 |