Predicting defects in imbalanced data using resampling methods: an empirical investigation - Citegraph

Paper Info

Title
Predicting defects in imbalanced data using resampling methods: an empirical investigation

Abstract
The development of correct and effective software defect prediction (SDP) models is one of the utmost needs of the software industry. Statistics of many defect-related open-source data sets depict the class imbalance problem in object-oriented projects. Models trained on imbalanced data leads to inaccurate future predictions owing to biased learning and ineffective defect prediction. In addition to this large number of software metrics degrades the model performance. This study aims at (1) identification of useful metrics in the software using correlation feature selection, (2) extensive comparative analysis of 10 resampling methods to generate effective machine learning models for imbalanced data, (3) inclusion of stable performance evaluators-AUC, GMean, and Balance and (4) integration of statistical validation of results. The impact of 10 resampling methods is analyzed on selected features of 12 object-oriented Apache datasets using 15 machine learning techniques. The performances of developed models are analyzed using AUC, GMean, Balance, and sensitivity. Statistical results advocate the use of resampling methods to improve SDP. Random oversampling portrays the best predictive capability of developed defect prediction models. The study provides a guideline for identifying metrics that are influential for SDP. The performances of oversampling methods are superior to undersampling methods.

Year	DOI	Venue
2022	10.7717/peerj-cs.573	PEERJ COMPUTER SCIENCE
Keywords	DocType	Volume
Software defect prediction, Machine learning, Class imbalance problem, Resampling methods, Statistical validation	Journal	8
ISSN	Citations	PageRank
2376-5992	0	0.34
References	Authors
0	2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Ruchika Malhotra	1	533	35.12
Juhi Jain	2	0	0.34

1