Title
Improving collection selection with overlap awareness in P2P search engines
Abstract
Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each collection, and subsequently the collections are ranked accordingly. Our thesis is that this simple approach is insufficient for several applications in which the collections typically overlap. This is the case, for example, for the collections built by autonomous peers crawling the web. We argue for the extension of existing quality measures using estimators of mutual overlap among collections and present experiments in which this combination outperforms CORI, a popular approach based on quality estimation. We outline our prototype implementation of a P2P web search engine, coined MINERVA, that allows handling large amounts of data in a distributed and self-organizing manner. We conduct experiments which show that taking overlap into account during collection selection can drastically decrease the number of collections that have to be contacted in order to reach a satisfactory level of recall, which is a great step toward the feasibility of distributed web search.
Year
DOI
Venue
2005
10.1145/1076034.1076049
SIGIR
Keywords
Field
DocType
improving collection selection,great step,popular approach,p2p search engine,autonomous peer,collection selection,simple approach,expected result quality,large amount,web search,p2p web search engine,quality estimation,self organization,p2p,web search engine,search engine
Collection selection,Web search engine,Data mining,Computer science,Artificial intelligence,Web search query,Search engine,Crawling,Information retrieval,Ranking,Recall,Machine learning,Estimator
Conference
ISBN
Citations 
PageRank 
1-59593-034-5
57
1.62
References 
Authors
23
5
Name
Order
Citations
PageRank
matthias bender130914.34
Sebastian Michel294658.72
Peter Triantafillou31261151.76
Gerhard Weikum4127102146.01
Christian Zimmer528213.36