Title
An Empirical Study On Predictability Of Software Maintainability Using Imbalanced Data
Abstract
In software engineering predictive modeling, early prediction of software modules or classes that possess high maintainability effort is a challenging task. Many prediction models are constructed to predict the maintainability of software classes or modules by applying various machine learning (ML) techniques. If the software modules or classes need high maintainability, effort would be reduced in a dataset, and there would be imbalanced data to train the model. The imbalanced datasets make ML techniques bias their predictions towards low maintainability effort or majority classes, and minority class instances get discarded as noise by the machine learning (ML) techniques. In this direction, this paper presents empirical work to improve the performance of software maintainability prediction (SMP) models developed with ML techniques using imbalanced data. For developing the models, the imbalanced data is pre-processed by applying data resampling methods. Fourteen data resampling methods, including oversampling, undersampling, and hybrid resampling, are used in the study. The study results recommend that the safe-level synthetic minority oversampling technique (Safe-Level-SMOTE) is a useful method to deal with the imbalanced datasets and to develop competent prediction models to forecast software maintainability.
Year
DOI
Venue
2020
10.1007/s11219-020-09525-y
SOFTWARE QUALITY JOURNAL
Keywords
DocType
Volume
Software maintainability prediction, Machine learning, Data resampling, Imbalanced learning
Journal
28
Issue
ISSN
Citations 
4
0963-9314
1
PageRank 
References 
Authors
0.35
0
2
Name
Order
Citations
PageRank
Ruchika Malhotra153335.12
Kusum Lata211.36