Abstract | ||
---|---|---|
The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper, we propose a two-fold entity retrieval approach. In a first, offline preprocessing step, we cluster entities based on the x---means and spectral clustering algorithms. In the second step, we propose an optimized retrieval model which takes advantage of our precomputed clusters. For a given set of entities retrieved by the BM25F retrieval approach and a given user query, we further expand the result set with relevant entities by considering features of the queries, entities and the precomputed clusters. Finally, we re-rank the expanded result set with respect to the relevance to the query. We perform a thorough experimental evaluation on the Billions Triple Challenge BTC12 dataset. The proposed approach shows significant improvements compared to the baseline and state of the art approaches. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1007/978-3-319-25007-6_28 | Proceedings of the 14th International Conference on The Semantic Web - ISWC 2015 - Volume 9366 |
DocType | Volume | ISSN |
Journal | abs/1703.10349 | 0302-9743 |
Citations | PageRank | References |
6 | 0.41 | 19 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Besnik Fetahu | 1 | 148 | 19.26 |
Ujwal Gadiraju | 2 | 69 | 8.42 |
Stefan Dietze | 3 | 597 | 68.07 |