Title
Locality in search engine queries and its implications for caching
Abstract
Caching is a popular technique for reducing both server load and user response time in distributed systems. We consider the question of whether caching might be effective for search engines as well. We study two real search engine traces by examining query locality and its implications for caching. Our trace analysis produced three results. One result shows that queries have significant locality, with query frequency following a Zipf distribution. Very popular queries are shared among different users and can be cached at servers or proxies, while 16% to 22% of the queries are from the same users and should be cached at the user side. Multiple-word queries are shared less and should be cached mainly at the user side. Another result shows that if caching is to be done at the user side, short-term caching for hours is enough to cover query temporal locality, while server/proxy caching should use longer periods, such as days. The third result showed that most users have small lexicons when submitting queries. Frequent users who submit many search requests tend to reuse a small subset of words to form queries. Thus, with proxy or user side caching, prefetching based on the user lexicon looks promising.
Year
DOI
Venue
2002
10.1109/INFCOM.2002.1019374
INFOCOM
Keywords
Field
DocType
prefetching,user response time,zipf distribution,distributed systems,lexicons,cache storage,proxy caching,server load,server caching,search engine queries,query locality,caching,query temporal locality,search engines,query processing,ranking,frequency,efficiency,internet,reaction time,statistical analysis,distributed computing,sharing,computer science,bandwidth,optimization,reduction,performance engineering,computer networks,data management,information retrieval,search engine
Zipf's law,Locality,Search engine,Locality of reference,Information retrieval,Ranking,Computer science,Cache,Server,Database,The Internet
Conference
Volume
ISSN
ISBN
3
0743-166X
0-7803-7476-2
Citations 
PageRank 
References 
96
7.85
8
Authors
2
Name
Order
Citations
PageRank
Yinglian Xie1114076.73
David R. O'hallaron21243126.28