Learning from Open-Source Projects: An Empirical Study on Defect Prediction - Citegraph

Paper Info

Title
Learning from Open-Source Projects: An Empirical Study on Defect Prediction

Abstract
The fundamental issue in cross project defect prediction is selecting the most appropriate training data for creating quality defect predictors. Another concern is whether historical data of open-source projects can be used to create quality predictors for proprietary projects from a practical point-of-view. Current studies have proposed statistical approaches to finding these training data, however, thus far no apparent effort has been made to study their success on proprietary data. Also these methods apply brute force techniques which are computationally expensive. In this work we introduce a novel data selection procedure which takes into account the similarities between the distribution of the test and potential training data. Additionally we use feature subset selection to increase the similarity between the test and training sets. Our procedure provides a comparable and scalable means of solving the cross project defect prediction problem for creating quality defect predictors. To evaluate our procedure we conducted empirical studies with comparisons to the within company defect prediction and a relevancy filtering method. We found that our proposed method performs relatively better than the filtering method in terms of both computation cost and prediction performance.

Year	DOI	Venue
2013	10.1109/ESEM.2013.20	ESEM
Keywords	Field	DocType
public domain software,cross project defect prediction problem,data selection procedure,test distribution,open-source project learning,relevancy filtering method,statistical analysis,learning (artificial intelligence),company defect prediction,quality defect predictor creation,computation cost,program debugging,prediction performance,brute force techniques,test-training set similarity,statistical approach,project management,feature subset selection,cross-project,proprietary data,software defect prediction,data similarity,instance selection,training data,proprietary projects,learning artificial intelligence	Training set,Data modeling,Data mining,Computer science,Filter (signal processing),Cross project,Artificial intelligence,Machine learning,Empirical research,Scalability,Computation,Project management	Conference
Volume	Issue	ISSN
null	null	1938-6451
ISBN	Citations	PageRank
978-0-7695-5056-5	24	0.63
References	Authors
22	4

Authors (4 rows)

Cited by (24 rows)

References (22 rows)

Name	Order	Citations	PageRank
Zhimin He	1	536	35.90
Fayola Peters	2	24	0.63
Tim Menzies	3	2886	151.44
Ye Yang	4	103	18.26

1