Title | ||
---|---|---|
PAMA-TTS: Progression-Aware Monotonic Attention for Stable SEQ2SEQ TTS with Accurate Phoneme Duration Control |
Abstract | ||
---|---|---|
Sequence expansion between encoder and decoder is a critical challenge in sequence-to-sequence TTS. Attention-based methods achieve great naturalness but suffer from unstable issues like missing and repeating phonemes, not to mention accurate duration control. Duration-informed methods, on the contrary, seem to easily adjust phoneme duration but show obvious degradation in speech naturalness. This paper proposes PAMA-TTS to address the problem. It takes the advantage of both flexible attention and explicit duration models. Based on the monotonic attention mechanism, PAMA-TTS also leverages token duration and relative position of a frame, especially countdown information, i.e. in how many future frames the present phoneme will end. They help the attention to move forward along the token sequence in a soft but reliable control. Experimental results prove that PAMA-TTS achieves the highest naturalness, while has on-par or even better duration controllability than the duration-informed model. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/ICASSP43922.2022.9746202 | IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yunchao He | 1 | 1 | 1.71 |
Jian Luan | 2 | 0 | 0.34 |
Yujun Wang | 3 | 48 | 10.48 |