Title | ||
---|---|---|
Modeling morphologically rich languages using split words and unstructured dependencies |
Abstract | ||
---|---|---|
We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, Flex-Grams, which assume that the n -- 1 tokens that determine the probability of a given token can be chosen anywhere in the sentence rather than the preceding n -- 1 positions. Our final model achieves 27% perplexity reduction compared to the standard n-gram model. |
Year | Venue | Keywords |
---|---|---|
2009 | ACL/IJCNLP (Short Papers) | flexible n-gram model,splitting word,final model,unstructured dependency,significant perplexity reduction,perplexity reduction,morphologically rich language,disambiguator result,preceding n,split word,morphological analyzer,standard n-gram model |
Field | DocType | Volume |
Turkish,Suffix,Computer science,Perplexity reduction,Speech recognition,Natural language processing,Artificial intelligence,Sentence,Security token | Conference | P09-2 |
Citations | PageRank | References |
8 | 0.71 | 12 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Deniz Yuret | 1 | 684 | 49.39 |
Ergun Biçici | 2 | 133 | 13.23 |