Title
Spectral Voice Conversion For Text-To-Speech Synthesis
Abstract
A new voice conversion algorithm that modifies a source speaker's speech to sound as if produced by a target speaker is presented. It is applied to a residual-excited LPC text-to-speech diphone synthesizer. Spectral parameters are mapped using a locally linear transformation based on Gaussian mixture models whose parameters are trained by joint density estimation. The LPC residuals are adjusted to match the target speaker's average pitch. To study effects of the amount of training on performance, data sets of varying sizes are created by automatically selecting subsets of all available diphones by a vector quantization method. In an objective evaluation, the proposed method is found to perform more reliably for small training sets than a previous approach. In perceptual tests, it was shown that nearly optimal spectral conversion performance was achieved, even with a small amount of training data However, speech quality improved with increases in the training set size.
Year
DOI
Venue
1998
10.1109/ICASSP.1998.674423
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6
Keywords
Field
DocType
gaussian mixture models,linear predictive coding,loudspeakers,quality improvement,text to speech,speech synthesis,density estimation,gaussian mixture model,linear transformation,parameter estimation,gaussian processes,natural languages,testing,vector quantization,training data
Density estimation,Speech synthesis,Diphone,Pattern recognition,Computer science,Speech recognition,Vector quantization,Artificial intelligence,Estimation theory,Loudspeaker,Mixture model,Linear predictive coding
Conference
ISSN
Citations 
PageRank 
1520-6149
231
13.86
References 
Authors
5
2
Search Limit
100231
Name
Order
Citations
PageRank
Alexander Kain137732.39
Michael W. Macon233425.79