Title
End-to-end Tibetan Speech Synthesis Based on Phones and Semi-syllables
Abstract
Due to the 2D architecture of Tibetan characters, it is not convenient to treat the letters sequences as the input of the end-to-end speech synthesis system. The experiments are conducted based on phones and semi-syllables sequences respectively. In training and testing, the text is segmented into a sequence of syllables first, then syllables are transformed into phones and semi-syllables as the input sequence of the model. The results demonstrate the encoding and decoding alignment effect of Tibetan speech synthesis based on phones is better than that based on semi-syllables. In addition, the Highway network in the architecture plays a key role in the convergence of the model.
Year
DOI
Venue
2019
10.1109/APSIPAASC47483.2019.9023093
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
DocType
ISSN
Citations 
Conference
2309-9402
0
PageRank 
References 
Authors
0.34
0
4
Name
Order
Citations
PageRank
Guan-Yu Li124.42
Lisai Luo200.34
Chunwei Gong300.34
Shiliang Lv400.34