Abstract | ||
---|---|---|
We propose a simple and flexible framework for offline evaluation based on a weak ordering of results (which we call "partial preferences") that define a set of ideal rankings for a query. These partial preferences can be derived from from side-by-side preference judgments, from graded judgments, from a combination of the two, or through other methods. We then measure the performance of a ranker by computing the maximum similarity between the actual ranking it generates for the query and elements of this ideal result set. We call this measure the "compatibility" of the actual ranking with the ideal result set. We demonstrate that compatibility can replace and extend current offline evaluation measures that depend on fixed relevance grades that must be mapped to gain values, such as NDCG. We examine a specific instance of compatibility based on rank biased overlap (RBO). We experimentally validate compatibility over multiple collections with different types of partial preferences, including very fine-grained preferences and partial preferences focused on the top ranks. As well as providing additional insights and flexibility, compatibility avoids shortcomings of both full preference judgments and traditional graded judgments.
|
Year | DOI | Venue |
---|---|---|
2020 | 10.1145/3409256.3409816 | ICTIR '20: The 2020 ACM SIGIR International Conference on the Theory of Information Retrieval
Virtual Event
Norway
September, 2020 |
DocType | ISBN | Citations |
Conference | 978-1-4503-8067-6 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Charles L.A. Clarke | 1 | 3289 | 286.78 |
Alexandra Vtyurina | 2 | 21 | 4.13 |
Mark D. Smucker | 3 | 948 | 60.04 |