Title
Risk prediction and risk factors identification from imbalanced data with RPMBGA+
Abstract
In this paper, we propose a new method to predict the risk of an event very accurately from imbalanced data in which the number of instances of the majority class is very larger than that of the minority class and to identify the features that are relevant for the target risk factor. To solve the trade-off between the prediction rates of the majority and the minority classes, three input parameters are used, which supply the costs of misclassification of an instance from the majority and the minority classes or the sensitivity threshold of the minority class. To get relevant features and to utilize the prior information about the relationship of a feature with the target risk factor, a probabilistic model building genetic algorithm called RPMBGA+ is employed. By applying the proposed technique to the health checkup and lifestyle data of Toshiba Corporation, we have found that the proposed method improves the sensitivity of the minority class and selects a very small number of informative features.
Year
DOI
Venue
2008
10.1145/1388969.1389046
GECCO (Companion)
Keywords
DocType
Citations 
relevant feature,target risk factor,minority class,risk factors identification,sensitivity threshold,proposed technique,risk prediction,lifestyle data,majority class,new method,imbalanced data,feature selection,probabilistic model,genetic algorithm,risk factors,classification
Conference
1
PageRank 
References 
Authors
0.39
9
5
Name
Order
Citations
PageRank
Topon K. Paul1101.22
Ken Ueno212413.27
Koichiro Iwata3101.74
Toshio Hayashi4101.74
Nobuyoshi Honda5101.74