Title
Reactive index replication for distributed search engines
Abstract
Distributed search engines comprise multiple sites deployed across geographically distant regions, each site being specialized to serve the queries of local users. When a search site cannot accurately compute the results of a query, it must forward the query to other sites. This paper considers the problem of selecting the documents indexed by each site focusing on replication to increase the fraction of queries processed locally. We propose RIP, an algorithm for replicating documents and posting lists that is practical and has two important features. RIP evaluates user interests in an online fashion and uses only local data of a site. Being an online approach simplifies the operational complexity, while locality enables higher performance when processing queries and documents. The decision procedure, on top of being online and local, incorporates document popularity and user queries, which is critical when assuming a replication budget for each site. Having a replication budget reflects the hardware constraints of any given site. We evaluate RIP against the approach of replicating popular documents statically, and show that we achieve significant gains, while having the additional benefit of supporting incremental indexes.
Year
DOI
Venue
2012
10.1145/2348283.2348394
SIGIR
Keywords
Field
DocType
online fashion,local data,multiple site,popular documents statically,search engine,local user,search site,replication budget,processing query,online approach,reactive index replication,indexation,web search engine,replication
Data mining,Locality,Search engine,Information retrieval,Computer science,Popularity,Distributed index,Database
Conference
Citations 
PageRank 
References 
3
0.38
14
Authors
3
Name
Order
Citations
PageRank
Flavio P. Junqueira1103749.96
Vincent Leroy219718.23
Matthieu Morel3795.95