Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability. - Citegraph

Paper Info

Title
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability.

Abstract
Emotional text-to-speech synthesis (ETTS) has seen much progress in recent years. However, the generated voice is often not perceptually identifiable by its intended emotion category. To address this problem, we propose a new interactive training paradigm for ETTS, denoted as i-ETTS, which seeks to directly improve the emotion discriminability by interacting with a speech emotion recognition (SER) model. Moreover, we formulate an iterative training strategy with reinforcement learning to ensure the quality of i-ETTS optimization. Experimental results demonstrate that the proposed i-ETTS outperforms the state-of-the-art baselines by rendering speech with more accurate emotion style. To our best knowledge, this is the first study of reinforcement learning in emotional text-to-speech synthesis.

Year	DOI	Venue
2021	10.21437/Interspeech.2021-1236	Interspeech
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
rui liu	1	24	3.26
Berrak Sisman	2	60	10.34
Haizhou Li	3	3678	334.61

1