Abstract | ||
---|---|---|
Due to the 2D architecture of Tibetan characters, it is not convenient to treat the letters sequences as the input of the end-to-end speech synthesis system. The experiments are conducted based on phones and semi-syllables sequences respectively. In training and testing, the text is segmented into a sequence of syllables first, then syllables are transformed into phones and semi-syllables as the input sequence of the model. The results demonstrate the encoding and decoding alignment effect of Tibetan speech synthesis based on phones is better than that based on semi-syllables. In addition, the Highway network in the architecture plays a key role in the convergence of the model. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/APSIPAASC47483.2019.9023093 | Asia-Pacific Signal and Information Processing Association Annual Summit and Conference |
DocType | ISSN | Citations |
Conference | 2309-9402 | 0 |
PageRank | References | Authors |
0.34 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Guan-Yu Li | 1 | 2 | 4.42 |
Lisai Luo | 2 | 0 | 0.34 |
Chunwei Gong | 3 | 0 | 0.34 |
Shiliang Lv | 4 | 0 | 0.34 |