Title | ||
---|---|---|
A novel ensemble decision tree based on under-sampling and clonal selection for web spam detection. |
Abstract | ||
---|---|---|
Currently, web spamming is a serious problem for search engines. It not only degrades the quality of search results by intentionally boosting undesirable web pages to users, but also causes the search engine to waste a significant amount of computational and storage resources in manipulating useless information. In this paper, we present a novel ensemble classifier for web spam detection which combines the clonal selection algorithm for feature selection and under-sampling for data balancing. This web spam detection system is called USCS. The USCS ensemble classifiers can automatically sample and select sub-classifiers. First, the system will convert the imbalanced training dataset into several balanced datasets using the under-sampling method. Second, the system will automatically select several optimal feature subsets for each sub-classifier using a customized clonal selection algorithm. Third, the system will build several C4.5 decision tree sub-classifiers from these balanced datasets based on its specified features. Finally, these sub-classifiers will be used to construct an ensemble decision tree classifier which will be applied to classify the examples in the testing data. Experiments on WEBSPAM-UK2006 dataset on the web spam problem show that our proposed approach, the USCS ensemble web spam classifier, contributes significant classification performance compared to several baseline systems and state-of-the-art approaches. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1007/s10044-017-0602-2 | Pattern Anal. Appl. |
Keywords | Field | DocType |
Web spam detection, Ensemble learning, Clonal selection algorithm, Feature selection, Decision trees | Data mining,Web page,Feature selection,Computer science,Artificial intelligence,Ensemble learning,Pattern recognition,Boosting (machine learning),Clonal selection algorithm,Machine learning,Decision tree learning,Spamming,Spamdexing | Journal |
Volume | Issue | ISSN |
21 | 3 | 1433-755X |
Citations | PageRank | References |
0 | 0.34 | 27 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xiao-Yong Lu | 1 | 0 | 0.68 |
Mu-Sheng Chen | 2 | 0 | 0.34 |
Jheng-Long Wu | 3 | 95 | 9.54 |
Pei-Chann Chang | 4 | 1752 | 109.32 |
Meng-Hui Chen | 5 | 28 | 2.55 |