Abstract | ||
---|---|---|
The problem of fully decentralized search over many collections is considered. The objective is to approximate the results of centralized search (namely, using a central index) while controlling the communication cost and involving only a small number of collections. The proposed solution is couched in a peer-to-peer (P2P) network, but can also be applied in other setups. Peers publish per-term summaries of their collections. Specifically, for each term, the range of document scores is divided into intervals; and for each interval, a KMV (K Minimal Values) synopsis of its documents is created. A new peer-selection algorithm uses the KMV synopses and two scoring functions in order to adaptively rank the peers, according to the relevance of their documents to a given query. The proposed method achieves high-quality results while meeting the above criteria of efficiency. In particular, experiments are done on two large, real-world datasets; one is blogs and the other is web data. These experiments show that the algorithm outperforms the state-of-the-art approaches and is robust over different collections, various scoring functions and multi-term queries. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1145/1935826.1935860 | WSDM |
Keywords | Field | DocType |
central index,various scoring function,proposed solution,decentralized search,kmv synopsis,new peer-selection algorithm,centralized search,scoring function,adaptive peer-selection algorithm,k minimal values,p2p,indexation,score function | Small number,Publication,Data mining,Information retrieval,Computer science,Selection algorithm | Conference |
Citations | PageRank | References |
1 | 0.35 | 24 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yosi Mass | 1 | 574 | 60.91 |
Yehoshua Sagiv | 2 | 5362 | 1575.95 |
Michal Shmueli-Scheuer | 3 | 89 | 16.11 |