A comparison of techniques to find mirrored hosts on the WWW - Citegraph

Paper Info

Title
A comparison of techniques to find mirrored hosts on the WWW

Abstract
Abstract We compare,several algorithms for identifying mirrored hosts on the World Wide Web. The algorithms operate on the basis of URL strings and linkage data: the type of information easily available from web proxies and crawlers. Identification of mirrored hosts can improve web-based info rmation retrieval in several ways: First, by identifying mirrored hosts, search eng ines can avoid storing and returning duplicate documents. Second, several new information retrieval techniques for the Web make inferences based on the explicit links among,hypertext documents,‐ mirroring perturbs their graph model and degrades performance. Third, mirroring information can be used to redirect users to alternate mirror sites to com pensate for various failures, and can thus improve the performance,of web browsers and proxies. This work was presented at the Workshop on Organizing Web Space at the Fourth ACM Conference on

Year	DOI	Venue
1999	3.0.CO;2-0" target="_self" class="small-link-text"10.1002/1097-4571(2000)9999:99993.0.CO;2-0	Journal of the American Society for Information Science
Keywords	DocType	Volume
world wide web,information retrieval	Conference	51
Issue	ISSN	Citations
12	0002-8231	50
PageRank	References	Authors
7.70	10	4

Authors (4 rows)

Cited by (50 rows)

References (10 rows)

Name	Order	Citations	PageRank
Krishna A. Bharat	1	1211	252.86
Andrei Broder	2	7357	920.20
Jeffrey Dean	3	8304	671.50
Monika Rauch Henzinger	4	4307	481.86

1