Title
Reprint of: Efficient crawling through URL ordering
Abstract
In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more ''important'' pages first. Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time. We define several importance metrics, ordering schemes, and performance evaluation measures for this problem. We also experimentally evaluate the ordering schemes on the Stanford University Web. Our results show that a crawler with a good ordering scheme can obtain important pages significantly faster than one without.
Year
DOI
Venue
2012
10.1016/j.comnet.2012.10.006
Computer Networks
Keywords
DocType
Volume
Crawling,URL ordering
Journal
56
Issue
ISSN
Citations 
18
1389-1286
1
PageRank 
References 
Authors
0.36
0
3
Name
Order
Citations
PageRank
Junghoo Cho13088584.54
Héctor García-Molina2243595652.13
Lawrence Page36544793.31