Title
CLAMI: Defect Prediction on Unlabeled Datasets (T)
Abstract
Defect prediction on new projects or projects with limited historical data is an interesting problem in software engineering. This is largely because it is difficult to collect defect information to label a dataset for training a prediction model. Cross-project defect prediction (CPDP) has tried to address this problem by reusing prediction models built by other projects that have enough historical data. However, CPDP does not always build a strong prediction model because of the different distributions among datasets. Approaches for defect prediction on unlabeled datasets have also tried to address the problem by adopting unsupervised learning but it has one major limitation, the necessity for manual effort. In this study, we propose novel approaches, CLA and CLAMI, that show the potential for defect prediction on unlabeled datasets in an automated manner without need for manual effort. The key idea of the CLA and CLAMI approaches is to label an unlabeled dataset by using the magnitude of metric values. In our empirical study on seven open-source projects, the CLAMI approach led to the promising prediction performances, 0.636 and 0.723 in average f-measure and AUC, that are comparable to those of defect prediction based on supervised learning.
Year
DOI
Venue
2015
10.1109/ASE.2015.56
Automated Software Engineering
Keywords
Field
DocType
CLAMI approach,cross-project defect prediction,software engineering,defect information collection,unsupervised learning,CLA approach,supervised learning
Data modeling,Data mining,Semi-supervised learning,Computer science,Supervised learning,Software,Unsupervised learning,Artificial intelligence,Predictive modelling,Empirical research,Machine learning
Conference
ISSN
Citations 
PageRank 
1527-1366
37
0.70
References 
Authors
51
2
Name
Order
Citations
PageRank
Jaechang Nam138111.59
Sunghun Kim23036114.11