Multi-Speaker Modeling And Speaker Adaptation For Dnn-Based Tts Synthesis - Citegraph

Paper Info

Title
Multi-Speaker Modeling And Speaker Adaptation For Dnn-Based Tts Synthesis

Abstract
In DNN-based TTS synthesis, DNNs hidden layers can be viewed as deep transformation for linguistic features and the output layers as representation of acoustic space to regress the transformed linguistic features to acoustic parameters. The deep-layered architectures of DNN can not only represent highly-complex transformation compactly, but also take advantage of huge amount of training data. In this paper, we propose an approach to model multiple speakers TTS with a general DNN, where the same hidden layers are shared among different speakers while the output layers are composed of speaker-dependent nodes explaining the target of each speaker. The experimental results show that our approach can significantly improve the quality of synthesized speech objectively and subjectively, comparing with speech synthesized from the individual, speaker-dependent DNN-based TTS. We further transfer the hidden layers for a new speaker with limited training data and the resultant synthesized speech of the new speaker can also achieve a good quality in term of naturalness and speaker similarity.

Year	Venue	Keywords
2015	2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP)	statistical parametric speech synthesis, deep neural networks, multi-task learning, transfer learning
Field	DocType	ISSN
Training set,Pragmatics,Pattern recognition,Computer science,Naturalness,Speech recognition,Speaker recognition,Speaker diarisation,Artificial intelligence,Acoustic space,Hidden Markov model,Speaker adaptation	Conference	1520-6149
Citations	PageRank	References
9	0.51	8
Authors
4

Authors (4 rows)

Cited by (9 rows)

References (8 rows)

Name	Order	Citations	PageRank
Yuchen Fan	1	332	17.14
Qian Yao	2	527	51.55
Frank K. Soong	3	1395	268.29
Lei He	4	96	24.04

1