Title
wRACOG: A Gibbs Sampling-Based Oversampling Technique
Abstract
As machine learning techniques mature and are used to tackle complex scientific problems, challenges arise such as the imbalanced class distribution problem, where one of the target class labels is under-represented in comparison with other classes. Existing over sampling approaches for addressing this problem typically do not consider the probability distribution of the minority class while synthetically generating new samples. As a result, the minority class is not well represented which leads to high misclassification error. We introduce wRACOG, a Gibbs sampling-based over sampling approach to synthetically generating and strategically selecting new minority class samples. The Gibbs sampler uses the joint probability distribution of data attributes to generate new minority class samples in the form of a Markov chain. wRACOG iteratively learns a model by selecting samples from the Markov chain that have the highest probability of being misclassified. We validate the effectiveness of wRACOG using five UCI datasets and one new application domain dataset. A comparative study of wRACOG with three other well-known resampling methods provides evidence that wRACOG offers a definite improvement in classification accuracy for minority class samples over other methods.
Year
DOI
Venue
2013
10.1109/ICDM.2013.18
Data Mining
Keywords
Field
DocType
Markov processes,iterative methods,learning (artificial intelligence),pattern classification,sampling methods,statistical distributions,Gibbs sampling-based oversampling technique,Markov chain,UCI datasets,application domain dataset,classilication accuracy improvement,data attributes,imbalanced class distribution problem,iterative learning,machine learning techniques,misclassilication error,probability distribution,strategically selected minority class,synthetically generated minority class,target class labels,wRACOG,Gibbs sampling,Imbalanced class distribution,Markov chain Monte Carlo (MCMC),oversampling
Data mining,Random variable,Joint probability distribution,Markov process,Computer science,Markov chain,Probability distribution,Sampling (statistics),Artificial intelligence,Resampling,Machine learning,Gibbs sampling
Conference
ISSN
Citations 
PageRank 
1550-4786
1
0.35
References 
Authors
17
3
Name
Order
Citations
PageRank
Barnan Das11759.79
Narayanan C. Krishnan239217.46
Diane J. Cook35052596.13