Abstract | ||
---|---|---|
We consider the problem of full-text search involving multi-term queries in a network of self-organizing, autonomous peers. Existing approaches do not scale well with respect to the number of peers, because they either require access to a large number of peers or incur a high communication cost in order to achieve good query results. In this paper, we present a novel algorithmic framework for processing multi-term queries in P2P networks that achieves high recall while using (per-query) a small number of peers and a low communication cost, thereby enabling high query throughput. Our approach is based on per-query peer-selection strategy using two-dimensional histograms of score distributions. A full utilization of the histograms incurs a high communication cost. We show how to drastically reduce this cost by employing a two-phase peer-selection algorithm. We also describe an adaptive approach to peer selection that further increases the recall. Experiments on a large real-world collection show that the recall is indeed high while the number of involved peers and the communication cost are low. |
Year | DOI | Venue |
---|---|---|
2009 | 10.1145/1645953.1646281 | CIKM |
Keywords | Field | DocType |
effective full-text search,good query result,multi-term query,communication cost,adaptive approach,p2p network,high recall,large number,high communication cost,low communication cost,small number,high query throughput,p2p,self organization,clustering,histograms | Small number,Query throughput,Data mining,Histogram,Computer science,Full text search,Cluster analysis,Recall,Scalability | Conference |
Citations | PageRank | References |
4 | 0.42 | 6 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yosi Mass | 1 | 574 | 60.91 |
Yehoshua Sagiv | 2 | 5362 | 1575.95 |
Michal Shmueli-Scheuer | 3 | 89 | 16.11 |