Abstract | ||
---|---|---|
Web page segmentation aims to break a page into sections that can reveal the information presentation structure and appear coherent to readers. In this paper, we propose a new web page segmentation framework based on the process of analyzing and understanding web page structure. After extracting the segmentation graph structure, we formulate the label assignment task which determines whether each boundary should segment current block or not on a graph as a structured learning problem. Computation of highest scoring label assignment relies on Viterbi algorithm and joint feature function captures the dependency among boundaries. To solve the learning of parameters, we adopt a learning model based on perceptron algorithm. Furthermore, utilizing the previous framework, we propose a web information crawling application framework which integrates web page segmentation and semantic block classification process. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/ICTAI.2016.0097 | 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI) |
Keywords | Field | DocType |
Web page segmentation,Structured learning,Viterbi algorithm,Perceptron algorithm,Information crawling | Algorithm design,Crawling,Web page,Segmentation,Computer science,Structured prediction,Image segmentation,Artificial intelligence,Perceptron,Semantics,Machine learning | Conference |
ISSN | ISBN | Citations |
1082-3409 | 978-1-5090-4460-3 | 0 |
PageRank | References | Authors |
0.34 | 12 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hanyang Feng | 1 | 0 | 0.34 |
Wenzhe Zhang | 2 | 0 | 0.34 |
He-Sheng Wu | 3 | 3 | 0.75 |
Chongjun Wang | 4 | 90 | 38.99 |