Title
Web page classification based on uncorrelated semi-supervised intra-view and inter-view manifold discriminant feature extraction
Abstract
Web page classification has attracted increasing research interest. It is intrinsically a multi-view and semi-supervised application, since web pages usually contain two or more types of data, such as text, hyperlinks and images, and unlabeled pages are generally much more than labeled ones. Web page data is commonly high-dimensional. Thus, how to extract useful features from this kind of data in the multi-view semi-supervised scenario is important for web page classification. To our knowledge, only one method is specially presented for this topic. And with respect to a few semi-supervised multi-view feature extraction methods on other applications, there still exists much room for improvement. In this paper, we firstly design a feature extraction schema called semi-supervised intra-view and inter-view manifold discriminant (SI2MD) learning, which sufficiently utilizes the intra-view and inter-view discriminant information of labeled samples and the local neighborhood structures of unlabeled samples. We then design a semi-supervised uncorrelation constraint for the SI2MD schema to remove the multi-view correlation in the semi-supervised scenario. By combining the SI2MD schema with the constraint, we propose an uncorrelated semi-supervised intra-view and inter-view manifold discriminant (USI2MD) learning approach for web page classification. Experiments on public web page databases validate the proposed approach.
Year
Venue
Field
2015
IJCAI
Pattern recognition,Web page,Discriminant,Computer science,Uncorrelated,Feature extraction,Data type,Artificial intelligence,Hyperlink,Schema (psychology),Manifold
DocType
Citations 
PageRank 
Conference
4
0.39
References 
Authors
25
6
Name
Order
Citations
PageRank
Xiao-Yuan Jing176955.18
Qian Liu241.74
Fei Wu32209153.88
Xu, Baowen42476165.27
Yang-Ping Zhu5100.84
Songcan Chen64148191.89