Modeling morphologically rich languages using split words and unstructured dependencies - Citegraph

Paper Info

Title
Modeling morphologically rich languages using split words and unstructured dependencies

Abstract
We experiment with splitting words into their stem and suffix components for modeling morphologically rich languages. We show that using a morphological analyzer and disambiguator results in a significant perplexity reduction in Turkish. We present flexible n-gram models, Flex-Grams, which assume that the n -- 1 tokens that determine the probability of a given token can be chosen anywhere in the sentence rather than the preceding n -- 1 positions. Our final model achieves 27% perplexity reduction compared to the standard n-gram model.

Year	Venue	Keywords
2009	ACL/IJCNLP (Short Papers)	flexible n-gram model,splitting word,final model,unstructured dependency,significant perplexity reduction,perplexity reduction,morphologically rich language,disambiguator result,preceding n,split word,morphological analyzer,standard n-gram model
Field	DocType	Volume
Turkish,Suffix,Computer science,Perplexity reduction,Speech recognition,Natural language processing,Artificial intelligence,Sentence,Security token	Conference	P09-2
Citations	PageRank	References
8	0.71	12
Authors
2

Authors (2 rows)

Cited by (8 rows)

References (12 rows)

Name	Order	Citations	PageRank
Deniz Yuret	1	684	49.39
Ergun Biçici	2	133	13.23

1