Title
A novel ensemble decision tree based on under-sampling and clonal selection for web spam detection.
Abstract
Currently, web spamming is a serious problem for search engines. It not only degrades the quality of search results by intentionally boosting undesirable web pages to users, but also causes the search engine to waste a significant amount of computational and storage resources in manipulating useless information. In this paper, we present a novel ensemble classifier for web spam detection which combines the clonal selection algorithm for feature selection and under-sampling for data balancing. This web spam detection system is called USCS. The USCS ensemble classifiers can automatically sample and select sub-classifiers. First, the system will convert the imbalanced training dataset into several balanced datasets using the under-sampling method. Second, the system will automatically select several optimal feature subsets for each sub-classifier using a customized clonal selection algorithm. Third, the system will build several C4.5 decision tree sub-classifiers from these balanced datasets based on its specified features. Finally, these sub-classifiers will be used to construct an ensemble decision tree classifier which will be applied to classify the examples in the testing data. Experiments on WEBSPAM-UK2006 dataset on the web spam problem show that our proposed approach, the USCS ensemble web spam classifier, contributes significant classification performance compared to several baseline systems and state-of-the-art approaches.
Year
DOI
Venue
2018
10.1007/s10044-017-0602-2
Pattern Anal. Appl.
Keywords
Field
DocType
Web spam detection, Ensemble learning, Clonal selection algorithm, Feature selection, Decision trees
Data mining,Web page,Feature selection,Computer science,Artificial intelligence,Ensemble learning,Pattern recognition,Boosting (machine learning),Clonal selection algorithm,Machine learning,Decision tree learning,Spamming,Spamdexing
Journal
Volume
Issue
ISSN
21
3
1433-755X
Citations 
PageRank 
References 
0
0.34
27
Authors
5
Name
Order
Citations
PageRank
Xiao-Yong Lu100.68
Mu-Sheng Chen200.34
Jheng-Long Wu3959.54
Pei-Chann Chang41752109.32
Meng-Hui Chen5282.55