Title
Optimizing the cost of information retrieval testcollections.
Abstract
We consider the problem of optimally allocating limited resources to construct relevance judgements for a test collection that facilities reliable evaluation of retrieval systems. We assume that there is a large set of test queries, for each of which a large number of documents need to be judged though the available budget only permits to judge a subset of them. A candidate solution to this problem has to deal with, at least, three challenges. (i) Given a fixed budget it has to efficiently select a subset of query-documents pairs for acquiring relevance judgements. (ii) With collected relevance judgements it has to be able to not only accurately evaluate a set of systems participating in a test collection construction but also reliably assess the performance of new as yet unseen systems. (iii) Finally, it has to properly deal with uncertainty that is due to (a) the presence of unjudged documents in a rank list, (b) the presence of queries with no relevance judgements, and (c) errors caused by human assessors when labelling documents. In this thesis we propose an optimisation framework that accommodates appropriate solutions for each of the three challenges. Our approach is aimed to be of benefit to construct IR test collections by research institutes, e.g. NIST, or commercial search engines, e.g. Google and Bing, where there are large scale documents collections and loads of query logs however economic constraints prohibit gathering comprehensive relevance judgements.
Year
DOI
Venue
2011
10.1145/2065003.2065020
PIKM@CIKM
Keywords
Field
DocType
available budget,test query,large number,large scale documents collection,relevance judgement,test collection construction,large set,comprehensive relevance judgement,test collection,ir test collection,information retrieval testcollections,search engine,resource allocation,information retrieval,evaluation
Data mining,Search engine,Information retrieval,Computer science,NIST,Ranking (information retrieval),Resource allocation,Artificial intelligence,Economic constraints,Machine learning
Conference
Citations 
PageRank 
References 
4
0.38
8
Authors
3
Name
Order
Citations
PageRank
Mehdi Hosseini1543.77
Ingemar Cox23652795.60
Natasa Milic-Frayling391775.24