Title
Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction.
Abstract
Cross-project defect prediction (CPDP) refers to predicting defects in a target project using prediction models trained from historical data of other source projects. And CPDP in the scenario where source and target projects have different metric sets is called heterogeneous defect prediction (HDP). Recently, HDP has received much research interest. Existing HDP methods only consider the linear correlation relationship among the features (metrics) of the source and target projects, and such models are insufficient to evaluate nonlinear correlation relationship among the features. So these methods may suffer from the linearly inseparable problem in the linear feature space. Furthermore, existing HDP methods do not take the class imbalance problem into consideration. Unfortunately, the imbalanced nature of software defect datasets increases the learning difficulty for the predictors. In this paper, we propose a new cost-sensitive transfer kernel canonical correlation analysis (CTKCCA) approach for HDP. CTKCCA can not only make the data distributions of source and target projects much more similar in the nonlinear feature space, where the learned features have favorable separability, but also utilize the different misclassification costs for defective and defect-free classes to alleviate the class imbalance problem. We perform the Friedman test with Nemenyi’s post-hoc statistical test and the Cliff’s delta effect size test for the evaluation. Extensive experiments on 28 public projects from five data sources indicate that: (1) CTKCCA significantly performs better than the related CPDP methods; (2) CTKCCA performs better than the related state-of-the-art HDP methods.
Year
DOI
Venue
2018
https://doi.org/10.1007/s10515-017-0220-7
Autom. Softw. Eng.
Keywords
Field
DocType
Heterogeneous defect prediction,Kernel canonical correlation analysis,Class imbalance,Transfer learning,Cost-sensitive learning
Friedman test,Data mining,Feature vector,Kernel canonical correlation analysis,Nonlinear system,Computer science,Transfer of learning,Software bug,Artificial intelligence,Predictive modelling,Machine learning,Statistical hypothesis testing
Journal
Volume
Issue
ISSN
25
2
0928-8910
Citations 
PageRank 
References 
8
0.40
79
Authors
6
Name
Order
Citations
PageRank
Zhiqiang Li1443.41
Xiao-Yuan Jing276955.18
Fei Wu3120.90
Xiaoke Zhu41196.59
Xu, Baowen52476165.27
Shi Ying633431.11