Abstract | ||
---|---|---|
In this paper we describe an approach to improvement of information retrieval results for large textual databases by pre-indexing documents using bag-of-words and named entity recognition. The approach was applied on a database of geological projects financed by the Republic of Serbia for several decades now. Each document within this database is described by a summary report, consisting of metadata on the geological project, such as title, domain, keywords, abstract, and geographical location. A bag of words was produced from these metadata with the help of morphological dictionaries and transducers, while named entities were recognized using a rule-based system. Both were then used for pre-indexing documents for information retrieval purposes where ranking of retrieved documents was based on several $$tf\\_idf$$ based measures. Evaluation of ranked retrieval results based on data obtained by pre-indexing were compared to results obtained by informational retrieval without pre-indexing with precision-recall curve, showing a significant improvement in terms of the mean average precision measure. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1007/978-3-319-27932-9_15 | International KEYSTONE Conference |
DocType | Volume | ISSN |
Conference | 9398 | 0302-9743 |
Citations | PageRank | References |
0 | 0.34 | 6 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ranka Stankovic | 1 | 10 | 10.02 |
cvetana krstev | 2 | 30 | 12.10 |
Ivan Obradović | 3 | 14 | 6.89 |
Olivera Kitanovic | 4 | 0 | 1.01 |