Title
Phoneme Dependent Speaker Embedding And Model Factorization For Multi-Speaker Speech Synthesis And Adaptation
Abstract
This paper presents an architecture to perform speaker adaption in long short-term memory (LSTM) based Mandarin statistical parametric speech synthesis system. Compared with the conventional methods that focused on using fixed global speaker representations in utterance level for speaker recognition task, the proposed method extracts speaker representations in utterance and phoneme level, which can describe more pronunciation characteristics in phoneme level. And an attention mechanism is deployed to combine each level representations dynamically to train a task-specific phoneme dependent speaker embedding. To handle the unbalanced database and avoid over-fitting, the model is factored into an average model and an adaptation model and combined by an attention mechanism. We investigate the performance of speaker representations extracted by different methods. Experimental results confirm the adaptability of our proposed speaker embedding and model factorization structure. And listening tests demonstrate that our proposed method can achieve better adaptation performance than baselines in terms of naturalness and speaker similarity.
Year
DOI
Venue
2019
10.1109/icassp.2019.8682535
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords
Field
DocType
speech synthesis, speaker adaptation, speaker embedding, phoneme representation
Data modeling,Speech synthesis,Embedding,Pattern recognition,Computer science,Naturalness,Utterance,Active listening,Speech recognition,Parametric statistics,Speaker recognition,Artificial intelligence
Conference
ISSN
Citations 
PageRank 
1520-6149
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Ruibo Fu115.11
Jianhua Tao2848138.00
Zhengqi Wen38624.41
Yibin Zheng43815.13