Title | ||
---|---|---|
Reproduce. Generalize. Extend. On Information Retrieval Evaluation without Relevance Judgments |
Abstract | ||
---|---|---|
The evaluation of retrieval effectiveness by means of test collections is a commonly used methodology in the information retrieval field. Some researchers have addressed the quite fascinating research question of whether it is possible to evaluate effectiveness completely automatically, without human relevance assessments. Since human relevance assessment is one of the main costs of building a test collection, both in human time and money resources, this rather ambitious goal would have a practical impact. In this article, we reproduce the main results on evaluating information retrieval systems without relevance judgments; furthermore, we generalize such previous work to analyze the effect of test collections, evaluation metrics, and pool depth. We also expand the idea to semi-automatic evaluation and estimation of topic difficulty. Our results show that (i) previous work is overall reproducible, although some specific results are not; (ii) collection, metric, and pool depth impact the automatic evaluation of systems, which is anyway accurate in several cases; (iii) semi-automatic evaluation is an effective methodology; and (iv) automatic evaluation can (to some extent) be used to predict topic difficulty.
|
Year | DOI | Venue |
---|---|---|
2018 | 10.1145/3241064 | Journal of Data and Information Quality |
Keywords | Field | DocType |
Test collections,automatic retrieval evaluation,few topics,relevance judgments,reproducibility,topic difficulty | Research question,Information retrieval,Computer science | Journal |
Volume | Issue | ISSN |
10 | 3 | 1936-1955 |
Citations | PageRank | References |
1 | 0.35 | 36 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kevin Roitero | 1 | 30 | 13.74 |
Marco Passon | 2 | 1 | 0.35 |
Giuseppe Serra | 3 | 280 | 24.51 |
Stefano Mizzaro | 4 | 862 | 85.52 |