Voice Conversion With Cyclic Recurrent Neural Network And Fine-Tuned Wavenet Vocoder - Citegraph

Paper Info

Title
Voice Conversion With Cyclic Recurrent Neural Network And Fine-Tuned Wavenet Vocoder

Abstract
This paper presents a novel framework for providing high-quality parallel voice conversion (VC) using a cyclic recurrent neural network (RNN) and a finely tuned WaveNet vocoder. Using the proposed system, we are tackling the quality degradation issue faced by WaveNet when it is fed with estimated (oversmoothed) speech features, such as mel-cepstrum parameters predicted from a statistical model. In VC, providing predicted features to fine-tune a pretrained WaveNet model is not straightforward owing to the difference in time-sequence alignment. To overcome this problem, we propose the use of a cyclic spectral conversion network that is capable of performing a conversion flow, i. e., source-to-target, and a cyclic flow, i. e., generate self-predicted target speaker features, and is trained by using both the conversion and cyclic losses. The experimental results demonstrate that, overall, the proposed system significantly improves the converted speech, resulting in a mean opinion score of 3.79 and a speaker similarity score of 73.86%

Year	DOI	Venue
2019	10.1109/icassp.2019.8682156	2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords	Field	DocType
voice conversion, cyclic recurrent neural network, WaveNet fine-tuning, oversmoothed parameters	Logic gate,Pattern recognition,Computer science,Convolution,Recurrent neural network,Feature extraction,Mean opinion score,Artificial intelligence,Statistical model	Conference
ISSN	Citations	PageRank
1520-6149	0	0.34
References	Authors
0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Patrick Lumban Tobing	1	15	7.89
Yi-Chiao Wu	2	45	9.42
Tomoki Hayashi	3	96	18.49
Kazuhiro Kobayashi	4	66	9.91
Tomoki Toda	5	1874	167.18

1