Voice conversion in time-invariant speaker-independent space - Citegraph

Paper Info

Title
Voice conversion in time-invariant speaker-independent space

Abstract
In this paper, we present a voice conversion (VC) method that utilizes conditional restricted Boltzmann machines (CRBMs) for each speaker to obtain time-invariant speaker-independent spaces where voice features are converted more easily than those in an original acoustic feature space. First, we train two CRBMs for a source and target speaker independently using speaker-dependent training data (without the need to parallelize the training data). Then, a small number of parallel data are fed into each CRBM and the high-order features produced by the CRBMs are used to train a concatenating neural network (NN) between the two CRBMs. Finally, the entire network (the two CRBMs and the NN) is fine-tuned using the acoustic parallel data. Through voice-conversion experiments, we confirmed the high performance of our method in terms of objective and subjective evaluations, comparing it with conventional GMM, NN, and speaker-dependent DBN approaches.

Year	DOI	Venue
2014	10.1109/ICASSP.2014.6855136	Acoustics, Speech and Signal Processing
Keywords	DocType	ISSN
Boltzmann machines,learning (artificial intelligence),speech processing,CRBM,acoustic feature space,acoustic parallel data,conditional restricted Boltzmann machines,neural network,speaker-dependent training data,time-invariant speaker-independent spaces,voice conversion,voice features,Voice conversion,conditional restricted Boltzmann machine,deep learning,speaker specific features	Conference	1520-6149
Citations	PageRank	References
4	0.41	14
Authors
3

Authors (3 rows)

Cited by (4 rows)

References (14 rows)

Name	Order	Citations	PageRank
Toru Nakashika	1	81	13.60
Tetsuya Takiguchi	2	85	8.77
Yasuo Ariki	3	519	88.94

1