Title
Predictive Crawling for Commercial Web Content
Abstract
Web crawlers spend significant resources to maintain freshness of their crawled data. This paper describes the optimization of resources to ensure that product prices shown in ads in a context of a shopping sponsored search service are synchronized with current merchant prices. We are able to use the predictability of price changes to build a machine learned system leading to considerable resource savings for both the merchants and the crawler. We describe our solution to technical challenges due to partial observability of price history, feedback loops arising from applying machine learned models, and offers in cold start state. Empirical evaluation over large-scale product crawl data demonstrates the effectiveness of our model and confirms its robustness towards unseen data. We argue that our approach can be applicable in more general data pull settings.
Year
DOI
Venue
2019
10.1145/3308558.3313694
WWW '19: The Web Conference on The World Wide Web Conference WWW 2019
Keywords
Field
DocType
Commercial Content Change Dynamics, Predictive Crawling, Product Search
World Wide Web,Predictability,Observability,Crawling,Computer science,Robustness (computer science),Web crawler,Web content,Cold start (automotive)
Conference
ISBN
Citations 
PageRank 
978-1-4503-6674-8
0
0.34
References 
Authors
0
8
Name
Order
Citations
PageRank
Shuguang Han116818.43
Bernhard Brodowsky200.34
Przemek Gajda300.34
Sergey Novikov400.68
Michael Bendersky598648.69
Marc A. Najork62538278.16
Robin Dua700.34
Alexandrin Popescul871.12