Title
Voice conversion in time-invariant speaker-independent space
Abstract
In this paper, we present a voice conversion (VC) method that utilizes conditional restricted Boltzmann machines (CRBMs) for each speaker to obtain time-invariant speaker-independent spaces where voice features are converted more easily than those in an original acoustic feature space. First, we train two CRBMs for a source and target speaker independently using speaker-dependent training data (without the need to parallelize the training data). Then, a small number of parallel data are fed into each CRBM and the high-order features produced by the CRBMs are used to train a concatenating neural network (NN) between the two CRBMs. Finally, the entire network (the two CRBMs and the NN) is fine-tuned using the acoustic parallel data. Through voice-conversion experiments, we confirmed the high performance of our method in terms of objective and subjective evaluations, comparing it with conventional GMM, NN, and speaker-dependent DBN approaches.
Year
DOI
Venue
2014
10.1109/ICASSP.2014.6855136
Acoustics, Speech and Signal Processing
Keywords
DocType
ISSN
Boltzmann machines,learning (artificial intelligence),speech processing,CRBM,acoustic feature space,acoustic parallel data,conditional restricted Boltzmann machines,neural network,speaker-dependent training data,time-invariant speaker-independent spaces,voice conversion,voice features,Voice conversion,conditional restricted Boltzmann machine,deep learning,speaker specific features
Conference
1520-6149
Citations 
PageRank 
References 
4
0.41
14
Authors
3
Name
Order
Citations
PageRank
Toru Nakashika18113.60
Tetsuya Takiguchi2858.77
Yasuo Ariki351988.94