Title
SMOTE-GPU: Big Data preprocessing on commodity hardware for imbalanced classification.
Abstract
Nowadays, it is usual to work with large amounts of data since our capacity of collecting and storing information has increased significantly. The extraction of knowledge from these scenarios is commonly known as “Big Data,” and it is performed on large clusters with MapReduce platforms. Imbalanced classification poses a problem both in traditional and Big Data learning scenarios. Data sampling is one of the ways that allows to improve the performance on imbalanced problems. A commodity hardware-based method for Big Data problems can offload these computations from the expensive and highly demanded hardware that MapReduce platforms require. The characteristics of some sampling methods make them suitable to be adapted to commodity hardware, taking advantage of the parallel computation capabilities of graphics processing units. SMOTE is one of the most popular oversampling methods which is based on the nearest neighbor rule. The proposed SMOTE-GPU efficiently handles large datasets (several millions of instances) on a wide variety of commodity hardware, including a laptop computer.
Year
DOI
Venue
2017
10.1007/s13748-017-0128-2
Progress in AI
Keywords
Field
DocType
Imbalanced classification, SMOTE, CUDA, Big Data
Graphics,k-nearest neighbors algorithm,Data mining,Oversampling,Laptop,CUDA,Computer science,Preprocessor,Sampling (statistics),Artificial intelligence,Big data,Machine learning
Journal
Volume
Issue
ISSN
6
4
2192-6352
Citations 
PageRank 
References 
2
0.37
9
Authors
4
Name
Order
Citations
PageRank
Pablo D. Gutiérrez1211.35
Miguel Lastra2826.86
José Manuel Benítez388856.02
Francisco Herrera4273911168.49