Title
VRPSOFC: a framework for focused crawler using mutation improving particle swarm optimization algorithm
Abstract
The focused crawler is the key technology of the search engine. It filters webpages based on relevant algorithms until certain conditions are met. The current focused crawler is prone to topic-drift and low precision in the process of crawling the webpages. Therefore, this paper proposes a focused crawler framework (VRPSOFC) based on mutation improving particle swarm optimization. First of all, for each topic, VRPSOFC gets 3 different types of seed pages that are easy to generate large-scale web page aggregation based on the page click rate of Google search, which are official website, wikipedia, forum or video page. Then VRPSOFC uses the mutation improved particle swarm optimization algorithm proposed in this paper to crawl webpages, where each seed page will be used as the initial page. Finally, experiment in the real web environment and analyze the results. Compared with traditional VSM and other methods, VRPSOFC can obtain more accurate URL priority and crawl high quality web pages. Therefore, the topic crawler framework proposed in this paper is effective and important.
Year
DOI
Venue
2019
10.1145/3321408.3323081
Proceedings of the ACM Turing Celebration Conference - China
Keywords
Field
DocType
focused crawler, mutation, particle swarm algorithm, precision, topic-drift
Particle swarm optimization,Mathematical optimization,Computer science,Focused crawler
Conference
ISBN
Citations 
PageRank 
978-1-4503-7158-2
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Guangxia Xu1429.46
Peng Jiang225942.86
Chuang Ma3167.00
Mahmoud Daneshmand434546.70
Shaoci Xie500.34