Repeatable evaluation of search services in dynamic environments - Citegraph

Paper Info

Title
Repeatable evaluation of search services in dynamic environments

Abstract
In dynamic environments, such as the World Wide Web, a changing document collection, query population, and set of search services demands frequent repetition of search effectiveness (relevance) evaluations. Reconstructing static test collections, such as in TREC, requires considerable human effort, as large collection sizes demand judgments deep into retrieved pools. In practice it is common to perform shallow evaluations over small numbers of live engines (often pairwise, engine A vs. engine B) without system pooling. Although these evaluations are not intended to construct reusable test collections, their utility depends on conclusions generalizing to the query population as a whole. We leverage the bootstrap estimate of the reproducibility probability of hypothesis tests in determining the query sample sizes required to ensure this, finding they are much larger than those required for static collections. We propose a semiautomatic evaluation framework to reduce this effort. We validate this framework against a manual evaluation of the top ten results of ten Web search engines across 896 queries in navigational and informational tasks. Augmenting manual judgments with pseudo-relevance judgments mined from Web taxonomies reduces both the chances of missing a correct pairwise conclusion, and those of finding an errant conclusion, by approximately 50&percnt;.

Year	DOI	Venue
2007	10.1145/1292591.1292592	ACM Trans. Inf. Syst.
Keywords	Field	DocType
query population,web taxonomy,search services demand,additional key words and phrases: evaluation,correct pairwise conclusion,search effectiveness,query sample size,world wide web,repeatable evaluation,web search engine,document collection,dynamic environment,web search,considerable human effort,evaluation,sample size,hypothesis test	Population,Pairwise comparison,Data mining,Search engine,Information retrieval,Generalization,Computer science,Pooling,Web query classification,Statistical hypothesis testing,Sample size determination	Journal
Volume	Issue	ISSN
26	1	1046-8188
Citations	PageRank	References
130	3.66	37
Authors
4

Search Limit

100130

Authors (4 rows)

Cited by (100 rows)

References (37 rows)

Name	Order	Citations	PageRank
Eric C. Jensen	1	696	46.72
Steven M. Beitzel	2	696	46.72
Abdur Chowdhury	3	2013	160.59
Ophir Frieder	4	3300	419.55

1