Title
Modeling morphologically rich languages using split words and unstructured dependencies
Abstract
We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, Flex-Grams, which assume that the n -- 1 tokens that determine the probability of a given token can be chosen anywhere in the sentence rather than the preceding n -- 1 positions. Our final model achieves 27% perplexity reduction compared to the standard n-gram model.
Year
Venue
Keywords
2009
ACL/IJCNLP (Short Papers)
flexible n-gram model,splitting word,final model,unstructured dependency,significant perplexity reduction,perplexity reduction,morphologically rich language,disambiguator result,preceding n,split word,morphological analyzer,standard n-gram model
Field
DocType
Volume
Turkish,Suffix,Computer science,Perplexity reduction,Speech recognition,Natural language processing,Artificial intelligence,Sentence,Security token
Conference
P09-2
Citations 
PageRank 
References 
8
0.71
12
Authors
2
Name
Order
Citations
PageRank
Deniz Yuret168449.39
Ergun Biçici213313.23