A Reproducible Benchmark for P2P Retrieval - Citegraph

Paper Info

Title
A Reproducible Benchmark for P2P Retrieval

Abstract
With the growing popularity of information retrieval (IR) in distributed systems and in particular P2P Web search, a huge number of protocols and prototypes have been intro- duced in the literature. However, nearly every paper con- siders a dierent benchmark for its experimental evaluation, rendering their mutual comparison and the quantification of performance improvements an impossible task. We present a standardized, general purpose benchmark for P2P IR systems that finally makes this possible. We start by presenting a detailed requirement analysis for such a standardized benchmark framework that allows for repro- ducible and comparable experimental setups without sacri- ficing flexibility to suit dierent system models. We further suggest Wikipedia as a publicly-available and all-purpose document corpus and finally introduce a simple but yet flexi- ble clustering strategy that assigns the Wikipedia articles as documents to an arbitrary number of peers. After propos- ing a standardized, real-world query set as the benchmark workload, we review the metrics to evaluate the benchmark results and present an example benchmark run for our fully- implemented P2P Web search prototype MINERVA.

Year	Venue	Keywords
2006	ExpDB	distributed system,requirement analysis,information retrieval,system modeling,p2p
Field	DocType	Citations
Data mining,General purpose,Information retrieval,Computer science,Workload,Popularity,Requirements analysis,Rendering (computer graphics),Cluster analysis,SDET	Conference	18
PageRank	References	Authors
0.79	21	6

Authors (6 rows)

Cited by (18 rows)

References (21 rows)

Name	Order	Citations	PageRank
Thomas Neumann	1	2523	156.50
matthias bender	2	309	14.34
Sebastian Michel	3	946	58.72
Gerhard Weikum	4	12710	2146.01
philippe bonnet	5	18	0.79
Ioana Manolescu	6	2630	235.86

1