Title
Diversified search evaluation: lessons from the NTCIR-9 INTENT task
Abstract
The evaluation of diversified web search results is a relatively new research topic and is not as well-understood as the time-honoured evaluation methodology of traditional IR based on precision and recall. In diversity evaluation, one topic may have more than one intent, and systems are expected to balance relevance and diversity. The recent NTCIR-9 evaluation workshop launched a new task called INTENT which included a diversified web search subtask that differs from the TREC web diversity task in several aspects: the choice of evaluation metrics, the use of intent popularity and per-intent graded relevance, and the use of topic sets that are twice as large as those of TREC. The objective of this study is to examine whether these differences are useful, using the actual data recently obtained from the NTCIR-9 INTENT task. Our main experimental findings are: (1) The $$\hbox{D}\,\sharp$$ evaluation framework used at NTCIR provides more "intuitive" and statistically reliable results than Intent-Aware Expected Reciprocal Rank; (2) Utilising both intent popularity and per-intent graded relevance as is done at NTCIR tends to improve discriminative power, particularly for $$\hbox{D}\,\sharp$$ -nDCG; and (3) Reducing the topic set size, even by just 10 topics, can affect not only significance testing but also the entire system ranking; when 50 topics are used (as in TREC) instead of 100 (as in NTCIR), the system ranking can be substantially different from the original ranking and the discriminative power can be halved. These results suggest that the directions being explored at NTCIR are valuable.
Year
DOI
Venue
2013
10.1007/s10791-012-9208-x
Inf. Retr.
Keywords
Field
DocType
evaluation metrics,topic set,diversity evaluation,discriminative power,new research topic,per-intent graded relevance,time-honoured evaluation methodology,diversified search evaluation,evaluation framework,intent popularity,ntcir-9 intent task,ntcir-9 evaluation workshop
Data mining,Learning to rank,Reciprocal,Significance testing,Ranking,Information retrieval,Computer science,Precision and recall,As is,Popularity,Discriminative model
Journal
Volume
Issue
ISSN
16
4
1573-7659
Citations 
PageRank 
References 
14
0.71
24
Authors
2
Name
Order
Citations
PageRank
Tetsuya Sakai11460139.97
Ruihua Song2113859.33