Title
Quantifying performance and quality gains in distributed web search engines
Abstract
Distributed search engines based on geographical partitioning of a central Web index emerge as a feasible solution to the immense growth of the Web, user bases, and query traffic. However, there is still lack of research in quantifying the performance and quality gains that can be achieved by such architectures. In this paper, we develop various cost models to evaluate the performance benefits of a geographically distributed search engine architecture based on partial index replication and query forwarding. Specifically, we focus on possible performance gains due to the distributed nature of query processing and Web crawling processes. We show that any response time gain achieved by distributed query processing can be utilized to improve search relevance as the use of complex but more accurate algorithms can now be enabled for document ranking. We also show that distributed Web crawling leads to better Web coverage and try to see if this improves the search quality. We verify the validity of our claims over large, real-life datasets via simulations.
Year
DOI
Venue
2009
10.1145/1571941.1572013
SIGIR
Keywords
Field
DocType
web crawling,quantifying performance,search relevance,search quality,web coverage,search engine,central web index,web search engine,query processing,quality gain,search engine architecture,query traffic,query forwarding,indexation,data centers,data center
Web search query,Information retrieval,Query expansion,Computer science,Data Web,Web query classification,Search engine indexing,Web modeling,Web crawler,Distributed web crawling
Conference
Citations 
PageRank 
References 
29
1.14
11
Authors
3
Name
Order
Citations
PageRank
B. Barla Cambazoglu173538.87
Vassilis Plachouras294357.41
Ricardo Baeza-Yates36173635.97