Title
Fractional PageRank Crawler: Prioritizing URLs Efficiently for Crawling Important Pages Early
Abstract
Crawling important pages early is a well studied problem. However, the availability of different types of framework for publishing web content greatly increases the number of web pages. Therefore, the crawler should be fast enough to prioritize and download the important pages. As the importance of a page is not known before or during its download, the crawler needs a great deal of time to approximate the importance to prioritize the download of the web pages. In this research, we propose Fractional PageRank crawlers that prioritize the downloaded pages for the purpose of discovering important URLs early during the crawl. Our experiments demonstrate that they improve the running time dramatically while crawling the important pages early.
Year
DOI
Venue
2009
10.1007/978-3-642-00887-0_52
DASFAA
Keywords
Field
DocType
great deal,fractional pagerank crawler,web content,important page,web page,important urls,different type,crawling important pages early,prioritizing urls efficiently,web pages
PageRank,World Wide Web,HITS algorithm,Information retrieval,Web page,Computer science,Doorway page,Focused crawler,Web content,Web crawler,Database,Distributed web crawling
Conference
Volume
ISSN
Citations 
5463
0302-9743
3
PageRank 
References 
Authors
0.47
2
3
Name
Order
Citations
PageRank
Md. Hijbul Alam1263.88
JongWoo Ha2556.79
Sangkeun Lee349865.59