Abstract | ||
---|---|---|
In this paper, we present a voice conversion (VC) method that utilizes conditional restricted Boltzmann machines (CRBMs) for each speaker to obtain time-invariant speaker-independent spaces where voice features are converted more easily than those in an original acoustic feature space. First, we train two CRBMs for a source and target speaker independently using speaker-dependent training data (without the need to parallelize the training data). Then, a small number of parallel data are fed into each CRBM and the high-order features produced by the CRBMs are used to train a concatenating neural network (NN) between the two CRBMs. Finally, the entire network (the two CRBMs and the NN) is fine-tuned using the acoustic parallel data. Through voice-conversion experiments, we confirmed the high performance of our method in terms of objective and subjective evaluations, comparing it with conventional GMM, NN, and speaker-dependent DBN approaches. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1109/ICASSP.2014.6855136 | Acoustics, Speech and Signal Processing |
Keywords | DocType | ISSN |
Boltzmann machines,learning (artificial intelligence),speech processing,CRBM,acoustic feature space,acoustic parallel data,conditional restricted Boltzmann machines,neural network,speaker-dependent training data,time-invariant speaker-independent spaces,voice conversion,voice features,Voice conversion,conditional restricted Boltzmann machine,deep learning,speaker specific features | Conference | 1520-6149 |
Citations | PageRank | References |
4 | 0.41 | 14 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Toru Nakashika | 1 | 81 | 13.60 |
Tetsuya Takiguchi | 2 | 85 | 8.77 |
Yasuo Ariki | 3 | 519 | 88.94 |