Title
Non-parallel voice conversion using joint optimization of alignment by temporal context and spectral distortion
Abstract
Many voice conversion systems require parallel training sets of the source and target speakers. Non-parallel training is more complicated as it involves evaluation of source-target correspondence along with the conversion function itself. INCA is a recently proposed method for non-parallel training, based on iterative estimation of alignment and conversion function. The alignment is evaluated using a simple nearest-neighbor search, which often leads to phonetic miss-matched source-target pairs. We propose here a generalized approach, denoted as Temporal-Context INCA (TC-INCA), based on matching temporal context vectors. We formulate the training stage as a minimization problem of a joint cost, considering both context-based alignment and conversion function. We show that TC-INCA reduces the joint cost and prove its convergence. Experimental results indicate that TC-INCA significantly improves the alignment accuracy, compared to INCA. Moreover, subjective evaluations show that TC-INCA leads to improved quality of the synthesized output signals, when small training sets are used.
Year
DOI
Venue
2014
10.1109/ICASSP.2014.6855140
Acoustics, Speech and Signal Processing
Keywords
Field
DocType
iterative methods,optimisation,signal processing,iterative estimation,joint optimization,minimization problem,nonparallel voice conversion,spectral distortion,temporal context,Gaussian Mixture Model (GMM),INCA,Non-Parallel Voice Conversion,Spectral Distance
Minimization problem,Convergence (routing),Pattern recognition,Computer science,Artificial intelligence,Temporal context,Joint cost,Spectral distortion
Conference
ISSN
Citations 
PageRank 
1520-6149
4
0.42
References 
Authors
10
3
Name
Order
Citations
PageRank
Hadas Benisty1192.17
David Malah221960.95
Koby Crammer35252466.86