Title | ||
---|---|---|
Non-parallel voice conversion using joint optimization of alignment by temporal context and spectral distortion |
Abstract | ||
---|---|---|
Many voice conversion systems require parallel training sets of the source and target speakers. Non-parallel training is more complicated as it involves evaluation of source-target correspondence along with the conversion function itself. INCA is a recently proposed method for non-parallel training, based on iterative estimation of alignment and conversion function. The alignment is evaluated using a simple nearest-neighbor search, which often leads to phonetic miss-matched source-target pairs. We propose here a generalized approach, denoted as Temporal-Context INCA (TC-INCA), based on matching temporal context vectors. We formulate the training stage as a minimization problem of a joint cost, considering both context-based alignment and conversion function. We show that TC-INCA reduces the joint cost and prove its convergence. Experimental results indicate that TC-INCA significantly improves the alignment accuracy, compared to INCA. Moreover, subjective evaluations show that TC-INCA leads to improved quality of the synthesized output signals, when small training sets are used. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1109/ICASSP.2014.6855140 | Acoustics, Speech and Signal Processing |
Keywords | Field | DocType |
iterative methods,optimisation,signal processing,iterative estimation,joint optimization,minimization problem,nonparallel voice conversion,spectral distortion,temporal context,Gaussian Mixture Model (GMM),INCA,Non-Parallel Voice Conversion,Spectral Distance | Minimization problem,Convergence (routing),Pattern recognition,Computer science,Artificial intelligence,Temporal context,Joint cost,Spectral distortion | Conference |
ISSN | Citations | PageRank |
1520-6149 | 4 | 0.42 |
References | Authors | |
10 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hadas Benisty | 1 | 19 | 2.17 |
David Malah | 2 | 219 | 60.95 |
Koby Crammer | 3 | 5252 | 466.86 |