Abstract | ||
---|---|---|
Search is the most heavily used web application in the world and is still growing at an extraordinary rate. Understanding the behaviors of web search engines, therefore, is becoming increasingly important to the design and deployment of data center systems hosting search engines. In this paper, we study three search query traces collected from real world web search engines in three different search service providers. The first part of our study is to uncover the patterns hidden in the query traces by analyzing the variations, frequencies, and locality of query requests. Our analysis reveals that, contradicted to some previous studies, real-world query traces do not follow well-defined probability models, such as Poisson distribution and log-normal distribution. The second part of our study is to deploy the real query traces and three synthetic traces generated using probability models proposed by other researchers on a Nutch based search engine. The measured performance data from the deployments further confirm that synthetic traces do not accurately reflect the real traces. We develop an evaluation tool that can collect performance metrics on-line with negligible overhead. The performance metrics include average response time, CPU utilization, Disk accesses, and cycles-per-instructions, etc. The third of our study is to compare the search engine with representative benchmarks, namely Gridmix, SPECweb2005, TPC-C, SPECCPU2006, and HPCC, with respect to basic architecture-level characteristics and performance metrics, such as instruction mix, processor pipeline stall breakdown, memory access latency, and disk accesses. The experimental results show that web search engines have a high percentage of load/store instructions, but have good cache/memory performance. We hope those results presented in this paper will enable system designers to gain insights on optimizing systems hosting search engines. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1109/IISWC.2011.6114193 | IISWC |
Keywords | Field | DocType |
poisson distribution,architecture-level characteristics,measured performance data,synthetic trace,different search service provider,search query,probability model,web search engines,search service providers,search engine,web search engine,real world web search,log-normal distribution,search query traces,internet,processor pipeline stall breakdown,real workloads,memory performance,memory access latency,nutch based search engine,web application,disk access,search engines,performance metrics,instruction mix,data center system,cpu utilization,log normal distribution,service provider,system design,servers,data center,cache memory,cycles per instruction,engines,benchmark testing | Web search query,Search engine,Query expansion,CPU time,Computer science,Cache,Parallel computing,Web query classification,Real-time computing,Search analytics,Web application,Database | Conference |
ISBN | Citations | PageRank |
978-1-4577-2062-8 | 13 | 1.24 |
References | Authors | |
10 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Huafeng Xi | 1 | 13 | 1.24 |
Jianfeng Zhan | 2 | 767 | 62.86 |
Zhen Jia | 3 | 338 | 17.82 |
Xuehai Hong | 4 | 17 | 2.04 |
Lei Wang | 5 | 577 | 46.85 |
Lixin Zhang | 6 | 571 | 45.96 |
SUN Ning-Hui | 7 | 1268 | 97.37 |
Gang Lu | 8 | 311 | 12.40 |