Title | ||
---|---|---|
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability. |
Abstract | ||
---|---|---|
Emotional text-to-speech synthesis (ETTS) has seen much progress in recent years. However, the generated voice is often not perceptually identifiable by its intended emotion category. To address this problem, we propose a new interactive training paradigm for ETTS, denoted as i-ETTS, which seeks to directly improve the emotion discriminability by interacting with a speech emotion recognition (SER) model. Moreover, we formulate an iterative training strategy with reinforcement learning to ensure the quality of i-ETTS optimization. Experimental results demonstrate that the proposed i-ETTS outperforms the state-of-the-art baselines by rendering speech with more accurate emotion style. To our best knowledge, this is the first study of reinforcement learning in emotional text-to-speech synthesis. |
Year | DOI | Venue |
---|---|---|
2021 | 10.21437/Interspeech.2021-1236 | Interspeech |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
rui liu | 1 | 24 | 3.26 |
Berrak Sisman | 2 | 60 | 10.34 |
Haizhou Li | 3 | 3678 | 334.61 |