Title
Research On Crawling Mechanism And Policy For Crawling Product Information From Mobile Internet
Abstract
Product information on the mobile internet grows fast in volume and becomes hard in acquisition. Companies tend to deliver product information on their well-tuned mobile websites or websites that is responsive to various mobile devices. Thus, this kind of site is more of a web app than a traditional website, which we call a rich internet application (RIA). With RIAs, information are kept secret from search engine spiders by means of HTML5, Ajax and other scripting techniques in deep web, user interactions are needed to trigger some prescribed events in some certain order to show the whole picture of the information we need. In this paper, we identified the crux of the problem is how to provide the mechanism to parse the scripts and manipulate document object model (DOM) and the policy to trigger user events and run the scrape process. A new mechanism and policy was formulated based on web crawler techniques and studies in Ajax-specified web crawlers. By remodelling web pages redesigning the architecture of web crawler and refining scrape algorithm, we successfully scrape product data from mobile internet RIAs.
Year
DOI
Venue
2017
10.1504/IJCSM.2017.088946
INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS
Keywords
Field
DocType
crawler, scrape data, mobile internet, rich internet application, RIA, product information
World Wide Web,Mathematical optimization,HTML5,Web page,Mobile device,Ajax,Document Object Model,Web application,Web crawler,Mathematics,Rich Internet application
Journal
Volume
Issue
ISSN
8
6
1752-5055
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
Shu Wang122828.72
Jia Chen25416.18
Chonghuan Xu300.34