Abstract | ||
---|---|---|
Recent state-of-the-art neural text-to-speech synthesis models have significantly improved the quality of synthesized speech. However, the previous methods have remained several problems. While autoregressive models suffer from slow inference speed, non-autoregressive models usually have a complicated, time and memory-consuming training pipeline. This paper proposes a novel model called FastTacotr... |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/MAPR53640.2021.9585267 | 2021 International Conference on Multimedia Analysis and Pattern Recognition (MAPR) |
Keywords | DocType | ISBN |
Deep learning,text-to-speech,mel spectrogram | Conference | 978-1-6654-1910-9 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dinh Viet Sang | 1 | 0 | 0.34 |
Lam Xuan Thu | 2 | 0 | 0.34 |