Title
Failure of One, Fall of Many: An Exploratory Study of Software Features for Defect Prediction
Abstract
Software defect prediction represents an area of interest in both academia and the software industry. Thus, software defects are prevalent in software development and might generate numerous difficulties for users and developers apart. The current literature offers multiple alternative approaches to predict the likelihood of defects in the source code. Most of these studies concentrate on predicting defects from a broad set of software features. As a result, the individual discriminating power of software features is still unknown as some perform well only with specific projects or metrics. In this study, we applied machine learning techniques in a popular dataset. This data has information about software defects in five Java projects, containing 5,371 classes and 37 software features. To this aim, we convey an exploratory investigation that produced hundreds of thousands of machine learning models from a diverse collection of software features. These models are random in the sense that they promptly select the features from the entire pool of features. Even though the immense majority of models are ineffective, we could produce several models that yield accurate predictions, thus classifying defects from Java project classes. Among these accurate models, our results indicate that change metric features are more present than entropy or class-level metrics. We concentrated our analysis on models that rank a randomly chosen defective class higher than a casually selected clean class with over 80% accuracy. We also report and discuss some features contributing to the explanation of model decisions. Therefore, our study promotes reasoning on which features support predicting defects in these projects. Finally, we present the implications of our work to practitioners.
Year
DOI
Venue
2020
10.1109/SCAM51674.2020.00016
2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM)
Keywords
DocType
ISSN
defect prediction,explainability,source code metrics
Conference
1942-5430
ISBN
Citations 
PageRank 
978-1-7281-9249-9
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Geanderson Esteves dos Santos100.34
Eduardo Figueiredo285136.26