Modeling Prosodic Phrasing With Multi-Task Learning In Tacotron-Based Tts - Citegraph

Paper Info

Title
Modeling Prosodic Phrasing With Multi-Task Learning In Tacotron-Based Tts

Abstract
Tacotron-based end-to-end speech synthesis has shown remarkable voice quality. However, the rendering of prosody in the synthesized speech remains to be improved, especially for long sentences, where prosodic phrasing errors can occur frequently. In this letter, we extend the Tacotron-based speech synthesis framework to explicitly model the prosodic phrase breaks. We propose a multi-task learning scheme for Tacotron training, that optimizes the system to predict both Mel spectrum and phrase breaks. To our best knowledge, this is the first implementation of multi-task learning for Tacotron based TTS with a prosodic phrasing model. Experiments show that our proposed training scheme consistently improves the voice quality for both Chinese and Mongolian systems.

Year	DOI	Venue
2020	10.1109/LSP.2020.3016564	IEEE SIGNAL PROCESSING LETTERS
Keywords	DocType	Volume
Task analysis, Generators, Training, Speech synthesis, Decoding, Linguistics, Data models, Tacotron, multi-task learning, prosody	Journal	27
ISSN	Citations	PageRank
1070-9908	4	0.40
References	Authors
21	5

Authors (5 rows)

Cited by (4 rows)

References (21 rows)

Name	Order	Citations	PageRank
Rui Liu	1	6	3.81
Berrak Sisman	2	60	10.34
Fei Long	3	16	13.09
Guanglai Gao	4	10	4.31
Haizhou Li	5	3678	334.61

1