Title
A lemmatization method for Mongolian and its application to indexing for information retrieval
Abstract
In Mongolian, two different alphabets are used, Cyrillic and Mongolian. In this paper, we focus solely on the Mongolian language using the Cyrillic alphabet, in which a content word can be inflected when concatenated with one or more suffixes. Identifying the original form of content words is crucial for natural language processing and information retrieval. We propose a lemmatization method for Mongolian. The advantage of our lemmatization method is that it does not rely on noun dictionaries, enabling us to lemmatize out-of-dictionary words. We also apply our method to indexing for information retrieval. We use newspaper articles and technical abstracts in experiments that show the effectiveness of our method. Our research is the first significant exploration of the effectiveness of lemmatization for information retrieval in Mongolian.
Year
DOI
Venue
2009
10.1016/j.ipm.2009.01.008
Inf. Process. Manage.
Keywords
Field
DocType
cyrillic alphabet,lemmatization method,information retrieval,original form,natural language processing,lemmatization,newspaper article,noun dictionary,content word,mongolian language,different alphabet,noun,indexation
Lemmatisation,Information processing,Content word,Information retrieval,Computer science,Noun,Search engine indexing,Newspaper,Natural language,Artificial intelligence,Concatenation,Natural language processing
Journal
Volume
Issue
ISSN
45
4
Information Processing and Management
Citations 
PageRank 
References 
5
0.66
17
Authors
3
Name
Order
Citations
PageRank
Badam-Osor Khaltar1121.57
Atsushi Fujii248659.25
KhaltarBadam-Osor350.66