Title
Robust weighted kernel logistic regression in imbalanced and rare events data
Abstract
Recent developments in computing and technology, along with the availability of large amounts of raw data, have contributed to the creation of many effective techniques and algorithms in the fields of pattern recognition and machine learning. The main objectives for developing these algorithms include identifying patterns within the available data or making predictions, or both. Great success has been achieved with many classification techniques in real-life applications. With regard to binary data classification in particular, analysis of data containing rare events or disproportionate class distributions poses a great challenge to industry and to the machine learning community. This study examines rare events (REs) with binary dependent variables containing many more non-events (zeros) than events (ones). These variables are difficult to predict and to explain as has been evidenced in the literature. This research combines rare events corrections to Logistic Regression (LR) with truncated Newton methods and applies these techniques to Kernel Logistic Regression (KLR). The resulting model, Rare Event Weighted Kernel Logistic Regression (RE-WKLR), is a combination of weighting, regularization, approximate numerical methods, kernelization, bias correction, and efficient implementation, all of which are critical to enabling RE-WKLR to be an effective and powerful method for predicting rare events. Comparing RE-WKLR to SVM and TR-KLR, using non-linearly separable, small and large binary rare event datasets, we find that RE-WKLR is as fast as TR-KLR and much faster than SVM. In addition, according to the statistical significance test, RE-WKLR is more accurate than both SVM and TR-KLR.
Year
DOI
Venue
2011
10.1016/j.csda.2010.06.014
Computational Statistics & Data Analysis
Keywords
Field
DocType
large binary rare event,logistic regression,rare events data,truncated newton,kernel methods,kernel logistic regression,enabling re-wklr,binary dependent variable,robust weighted kernel logistic,rare event,data classification,available data,classification,rare events correction,endogenous sampling,raw data,statistical significance,power method,machine learning,kernel method,industrial engineering,numerical method,pattern recognition
Kernelization,Econometrics,Weighting,Support vector machine,Variables,Binary data,Kernel method,Statistics,Logistic regression,Rare events,Mathematics
Journal
Volume
Issue
ISSN
55
1
Computational Statistics and Data Analysis
Citations 
PageRank 
References 
16
0.93
23
Authors
2
Name
Order
Citations
PageRank
Maher Maalouf1485.36
Theodore B. Trafalis219921.77