Abstract | ||
---|---|---|
Cross-lingual voice conversion (CLVC) is quite challenging since the source and target speakers speak different languages. It is essential for various applications such as developing mixed-language speech synthesis systems, customization of speaking devices, etc. This paper proposes a deep neural network (DNN)-based approach utilizing bottleneck features for CLVC. In the proposed method, the speaker-independent information present in the speech signals from different languages is represented by using the bottleneck features extracted from a deep auto-encoder. A DNN model is trained to learn the mapping between bottleneck features and the corresponding spectral features of the target speaker. The proposed approach can capture speaker-specific characteristics of a target speaker, and requires no speech data from the source speaker during training. The performance of the proposed method is evaluated using data from three Indian languages: Telugu, Tamil and Malayalam. The experimental results show that the proposed method can effectively convert the source speaker voice to target speaker voice in a cross-lingual scenario. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1007/s11063-019-10149-y | Neural Processing Letters |
Keywords | DocType | Volume |
Cross-lingual voice conversion, Deep autoencoder, Deep neural network, Gaussian mixture model | Journal | 51 |
Issue | ISSN | Citations |
2 | 1370-4621 | 0 |
PageRank | References | Authors |
0.34 | 0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
M. Kiran Reddy | 1 | 3 | 2.11 |
K. Sreenivasa Rao | 2 | 649 | 60.90 |