Title
Document assignment in multi-site search engines
Abstract
Assigning documents accurately to sites is critical for the performance of multi-site Web search engines. In such settings, sites crawl only documents they index and forward queries to obtain best-matching documents from other sites. Inaccurate assignments may lead to inefficiencies when crawling Web pages or processing user queries. In this work, we propose a machine-learned document assignment strategy that uses the locality of document views in search results to decide upon assignments. We evaluate the performance of our strategy using various document features extracted from a large Web collection. Our experimental setup uses query logs from a number of search front-ends spread across different geographic locations and uses these logs to learn the document access patterns. We compare our technique against baselines such as region- and language-based document assignment and observe that our technique achieves substantial performance improvements with respect to recall. With our technique, we are able to obtain a small query forwarding rate (0.04) requiring roughly 45% less replication of documents compared to replicating all documents across all sites.
Year
DOI
Venue
2011
10.1145/1935826.1935907
WSDM
Keywords
Field
DocType
multi-site web search engine,crawling web page,document access pattern,large web collection,document view,assigning document,various document,machine-learned document assignment strategy,language-based document assignment,multi-site search engine,best-matching document,forward rates,web pages,feature extraction,classification,web search engine,front end,machine learning,search engine
Web search engine,Data mining,Web search query,Locality,Crawling,Search engine,Web page,Query expansion,Information retrieval,Computer science,Web query classification
Conference
Citations 
PageRank 
References 
7
0.44
31
Authors
3
Name
Order
Citations
PageRank
Ulf Brefeld163351.89
B. Barla Cambazoglu273538.87
Flavio P. Junqueira3103749.96