Title
Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization
Abstract
n increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language.
Year
DOI
Venue
2012
10.1109/TASL.2012.2187195
IEEE Transactions on Audio, Speech, and Language Processing
Keywords
Field
DocType
decision trees,matrix decomposition,hidden markov model,speech synthesis,interpolation,regression analysis,hidden markov models,decision tree,speech recognition
Speech synthesis,Pattern recognition,Computer science,Markov model,Matrix decomposition,Interpolation,Speech recognition,Speaker recognition,Parametric statistics,Artificial intelligence,Constructed language,Hidden Markov model
Journal
Volume
Issue
ISSN
20
6
1558-7916
Citations 
PageRank 
References 
12
0.58
23
Authors
7
Name
Order
Citations
PageRank
Heiga Zen11922103.73
Norbert Braunschweiler2598.47
Sabine Buchholz3563.96
Mark J. F. Gales43905367.45
Kate Knill524928.02
Sacha Krstulovic610611.97
Javier Latorre7615.09