Title
Web Page Segmentation and Its Application for Web Information Crawling
Abstract
Web page segmentation aims to break a page into sections that can reveal the information presentation structure and appear coherent to readers. In this paper, we propose a new web page segmentation framework based on the process of analyzing and understanding web page structure. After extracting the segmentation graph structure, we formulate the label assignment task which determines whether each boundary should segment current block or not on a graph as a structured learning problem. Computation of highest scoring label assignment relies on Viterbi algorithm and joint feature function captures the dependency among boundaries. To solve the learning of parameters, we adopt a learning model based on perceptron algorithm. Furthermore, utilizing the previous framework, we propose a web information crawling application framework which integrates web page segmentation and semantic block classification process.
Year
DOI
Venue
2016
10.1109/ICTAI.2016.0097
2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)
Keywords
Field
DocType
Web page segmentation,Structured learning,Viterbi algorithm,Perceptron algorithm,Information crawling
Algorithm design,Crawling,Web page,Segmentation,Computer science,Structured prediction,Image segmentation,Artificial intelligence,Perceptron,Semantics,Machine learning
Conference
ISSN
ISBN
Citations 
1082-3409
978-1-5090-4460-3
0
PageRank 
References 
Authors
0.34
12
4
Name
Order
Citations
PageRank
Hanyang Feng100.34
Wenzhe Zhang200.34
He-Sheng Wu330.75
Chongjun Wang49038.99