Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian. - Citegraph

Paper Info

Title
Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian.

Abstract
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and named entity recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia for several decades now. Each document within this database is described by a summary report, consisting of metadata on the geological project, such as title, domain, keywords, abstract, and geographical location. A bag of words was produced from these metadata with the help of morphological dictionaries and transducers, while named entities were recognized using a rule-based system. Both were then used for pre-indexing documents for information retrieval purposes where ranking of retrieved documents was based on several $$tf\\_idf$$ based measures. Evaluation of ranked retrieval results based on data obtained by pre-indexing were compared to results obtained by informational retrieval without pre-indexing with precision-recall curve, showing a significant improvement in terms of the mean average precision measure.

Year	DOI	Venue
2015	10.1007/978-3-319-27932-9_15	International KEYSTONE Conference
DocType	Volume	ISSN
Conference	9398	0302-9743
Citations	PageRank	References
0	0.34	6
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (6 rows)

Name	Order	Citations	PageRank
Ranka Stankovic	1	10	10.02
cvetana krstev	2	30	12.10
Ivan Obradović	3	14	6.89
Olivera Kitanovic	4	0	1.01

1