Title
Offline Evaluation without Gain
Abstract
We propose a simple and flexible framework for offline evaluation based on a weak ordering of results (which we call "partial preferences") that define a set of ideal rankings for a query. These partial preferences can be derived from from side-by-side preference judgments, from graded judgments, from a combination of the two, or through other methods. We then measure the performance of a ranker by computing the maximum similarity between the actual ranking it generates for the query and elements of this ideal result set. We call this measure the "compatibility" of the actual ranking with the ideal result set. We demonstrate that compatibility can replace and extend current offline evaluation measures that depend on fixed relevance grades that must be mapped to gain values, such as NDCG. We examine a specific instance of compatibility based on rank biased overlap (RBO). We experimentally validate compatibility over multiple collections with different types of partial preferences, including very fine-grained preferences and partial preferences focused on the top ranks. As well as providing additional insights and flexibility, compatibility avoids shortcomings of both full preference judgments and traditional graded judgments.
Year
DOI
Venue
2020
10.1145/3409256.3409816
ICTIR '20: The 2020 ACM SIGIR International Conference on the Theory of Information Retrieval Virtual Event Norway September, 2020
DocType
ISBN
Citations 
Conference
978-1-4503-8067-6
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Charles L.A. Clarke13289286.78
Alexandra Vtyurina2214.13
Mark D. Smucker394860.04