Title
Phonologically Aware Bilstm Model For Mongolian Phrase Break Prediction With Attention Mechanism
Abstract
Phrase break prediction is the first and most important component in increasing naturalness and intelligibility of text-to-speech (TTS) systems. Most works rely on language specific resources, large annotated corpus and feature engineering to perform well. However, phrase break prediction from text for Mongolian speech synthesis is still a great challenge because the data sparse problem due to the scarcity of resources. In this paper, we introduce a Bidirectional Long ShortTerm Memory (BiLSTM) model with attention mechanism which uses the position-based enhanced phonological representations, word embeddings and character embeddings to achieve state of the art performance. The position-based enhanced phonological representations, derived from a separately BiLSTM model, are comprised of phoneme and syllable embeddings which take along position information. By using an attention mechanism, the model is able to dynamically decide how much information to use from a word or phonological component. To handle Out-of-Vocabulary (OOV) problem, we incorporated word, phonological and character embeddings together as inputs to the model. Experimental results show the proposed method significantly outperforms the systems which only used the word embeddings by successfully leveraging position-based phonologically information and attention mechanism.
Year
DOI
Venue
2018
10.1007/978-3-319-97304-3_17
PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I
Keywords
Field
DocType
Mongolian, Phrase break, Phonologically, Attention mechanism, Position
Speech synthesis,Computer science,Naturalness,Phrase,Speech recognition,Feature engineering,Syllable,Artificial intelligence,Machine learning,Intelligibility (communication)
Conference
Volume
ISSN
Citations 
11012
0302-9743
0
PageRank 
References 
Authors
0.34
22
5
Name
Order
Citations
PageRank
Rui Liu163.81
Fei Long21613.09
Guanglai Gao37824.57
Hui Zhang4136.39
Yonghe Wang502.37