Title
Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian.
Abstract
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and named entity recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia for several decades now. Each document within this database is described by a summary report, consisting of metadata on the geological project, such as title, domain, keywords, abstract, and geographical location. A bag of words was produced from these metadata with the help of morphological dictionaries and transducers, while named entities were recognized using a rule-based system. Both were then used for pre-indexing documents for information retrieval purposes where ranking of retrieved documents was based on several $$tf\\_idf$$ based measures. Evaluation of ranked retrieval results based on data obtained by pre-indexing were compared to results obtained by informational retrieval without pre-indexing with precision-recall curve, showing a significant improvement in terms of the mean average precision measure.
Year
DOI
Venue
2015
10.1007/978-3-319-27932-9_15
International KEYSTONE Conference
DocType
Volume
ISSN
Conference
9398
0302-9743
Citations 
PageRank 
References 
0
0.34
6
Authors
4
Name
Order
Citations
PageRank
Ranka Stankovic11010.02
cvetana krstev23012.10
Ivan Obradović3146.89
Olivera Kitanovic401.01