Offline Evaluation without Gain - Citegraph

Paper Info

Title
Offline Evaluation without Gain

Abstract
We propose a simple and flexible framework for offline evaluation based on a weak ordering of results (which we call "partial preferences") that define a set of ideal rankings for a query. These partial preferences can be derived from from side-by-side preference judgments, from graded judgments, from a combination of the two, or through other methods. We then measure the performance of a ranker by computing the maximum similarity between the actual ranking it generates for the query and elements of this ideal result set. We call this measure the "compatibility" of the actual ranking with the ideal result set. We demonstrate that compatibility can replace and extend current offline evaluation measures that depend on fixed relevance grades that must be mapped to gain values, such as NDCG. We examine a specific instance of compatibility based on rank biased overlap (RBO). We experimentally validate compatibility over multiple collections with different types of partial preferences, including very fine-grained preferences and partial preferences focused on the top ranks. As well as providing additional insights and flexibility, compatibility avoids shortcomings of both full preference judgments and traditional graded judgments.

Year	DOI	Venue
2020	10.1145/3409256.3409816	ICTIR '20: The 2020 ACM SIGIR International Conference on the Theory of Information Retrieval Virtual Event Norway September, 2020
DocType	ISBN	Citations
Conference	978-1-4503-8067-6	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Charles L.A. Clarke	1	3289	286.78
Alexandra Vtyurina	2	21	4.13
Mark D. Smucker	3	948	60.04

1