Title
Voice Conversion With Cyclernn-Based Spectral Mapping And Finely Tuned Wavenet Vocoder
Abstract
In this paper, we present a novel framework for a voice conversion (VC) system based on a cyclic recurrent neural network (CycleRNN) and a finely tunedWaveNet vocoder. Even though Wave Net is capable of producing natural speech waveforms when fed with natural speech features, it still suffers from speech quality degradation when fed with oversmoothed features, such as spectral parameters estimated from a statistical model. One way to address this problem is to introduce oversmoothed features while developing a WaveNet model. However, in a VC framework, providing oversmoothed spectral features of a target speaker for WaveNet modeling is not straightforward owing to the difference in the time-sequence alignment from that of a source speaker. To overcome this problem, we propose the use of a cyclic spectral conversion network, i.e., CycleRNN, capable of performing a conversion flow, i.e., source-to-target, and a cyclic flow, i.e., to generate self-predicted target spectra. The CycleRNN spectral model is trained using both conversion and weighted cyclic losses. To finely tune WaveNet, a pretrained multispeaker WaveNet model is optimized using the self-predicted features of the corresponding target speaker of a speaker conversion pair. The experimental results demonstrate that 1) the proposed CycleRNN-based spectral model for WaveNet fine-tuning significantly improves the naturalness of the converted speech waveforms, giving an overall mean opinion score of 3.50; and 2) the proposed model yields the highest speaker conversion accuracy, giving an overall speaker similarity score of 78.33%, which is a significant improvement compared with conventional WaveNet fine-tuning using natural target features.
Year
DOI
Venue
2019
10.1109/ACCESS.2019.2955978
IEEE ACCESS
Keywords
DocType
Volume
Cyclic mapping flow, oversmoothed spectral features, recurrent neural network, spectral mapping, voice conversion, WaveNet fine-tuning, WaveNet vocoder
Journal
7
ISSN
Citations 
PageRank 
2169-3536
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Patrick Lumban Tobing1157.89
Yi-Chiao Wu2459.42
Tomoki Hayashi39618.49
Kobayashi, K.454.17
Tomoki Toda51874167.18