Title
On Transforming Relevance Scales
Abstract
Information Retrieval (IR) researchers have often used existing IR evaluation collections and transformed the relevance scale in which judgments have been collected, e.g., to use metrics that assume binary judgments like Mean Average Precision. Such scale transformations are often arbitrary (e.g., 0,1 mapped to 0 and 2,3 mapped to 1) and it is assumed that they have no impact on the results of IR evaluation. Moreover, the use of crowdsourcing to collect relevance judgments has become a standard methodology. When designing the crowdsourcing relevance judgment task, one of the decision to be made is the how granular the relevance scale used to collect judgments should be. Such decision has then repercussions on the metrics used to measure IR system effectiveness. In this paper we look at the effect of scale transformations in a systematic way. We perform extensive experiments to study the transformation of judgments from fine-grained to coarse-grained. We use different relevance judgments expressed on different relevance scales and either expressed by expert annotators or collected by means of crowdsourcing. The objective is to understand the impact of relevance scale transformations on IR evaluation outcomes and to draw conclusions on how to best transform judgments into a different scale, when necessary.
Year
DOI
Venue
2019
10.1145/3357384.3357988
Proceedings of the 28th ACM International Conference on Information and Knowledge Management
Keywords
Field
DocType
assessor agreement, crowdsourcing, ir evaluation, relevance scales
Information retrieval,Computer science
Conference
ISBN
Citations 
PageRank 
978-1-4503-6976-3
1
0.35
References 
Authors
0
5
Name
Order
Citations
PageRank
Lei Han193.13
Kevin Roitero23013.74
Eddy Maddalena3183.57
Stefano Mizzaro486285.52
Gianluca Demartini5627.99