Research On Crawling Mechanism And Policy For Crawling Product Information From Mobile Internet - Citegraph

Paper Info

Title
Research On Crawling Mechanism And Policy For Crawling Product Information From Mobile Internet

Abstract
Product information on the mobile internet grows fast in volume and becomes hard in acquisition. Companies tend to deliver product information on their well-tuned mobile websites or websites that is responsive to various mobile devices. Thus, this kind of site is more of a web app than a traditional website, which we call a rich internet application (RIA). With RIAs, information are kept secret from search engine spiders by means of HTML5, Ajax and other scripting techniques in deep web, user interactions are needed to trigger some prescribed events in some certain order to show the whole picture of the information we need. In this paper, we identified the crux of the problem is how to provide the mechanism to parse the scripts and manipulate document object model (DOM) and the policy to trigger user events and run the scrape process. A new mechanism and policy was formulated based on web crawler techniques and studies in Ajax-specified web crawlers. By remodelling web pages redesigning the architecture of web crawler and refining scrape algorithm, we successfully scrape product data from mobile internet RIAs.

Year	DOI	Venue
2017	10.1504/IJCSM.2017.088946	INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS
Keywords	Field	DocType
crawler, scrape data, mobile internet, rich internet application, RIA, product information	World Wide Web,Mathematical optimization,HTML5,Web page,Mobile device,Ajax,Document Object Model,Web application,Web crawler,Mathematics,Rich Internet application	Journal
Volume	Issue	ISSN
8	6	1752-5055
Citations	PageRank	References
0	0.34	0
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Shu Wang	1	228	28.72
Jia Chen	2	54	16.18
Chonghuan Xu	3	0	0.34

1