Reprint of: Efficient crawling through URL ordering - Citegraph

Paper Info

Title
Reprint of: Efficient crawling through URL ordering

Abstract
In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more ''important'' pages first. Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time. We define several importance metrics, ordering schemes, and performance evaluation measures for this problem. We also experimentally evaluate the ordering schemes on the Stanford University Web. Our results show that a crawler with a good ordering scheme can obtain important pages significantly faster than one without.

Year	DOI	Venue
2012	10.1016/j.comnet.2012.10.006	Computer Networks
Keywords	DocType	Volume
Crawling,URL ordering	Journal	56
Issue	ISSN	Citations
18	1389-1286	1
PageRank	References	Authors
0.36	0	3

Authors (3 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Junghoo Cho	1	3088	584.54
Héctor García-Molina	2	24359	5652.13
Lawrence Page	3	6544	793.31

1