Title
Empirical analysis of asymptotic ensemble learning for big data.
Abstract
In many application areas, data that is being generated and processed goes beyond the petabyte scale. Analyzing such an increasing massive volume of data faces computational, as well as, statistical challenges. In order to solve these challenges, distributed and parallel processing frameworks have been used for implementing scalable data analysis algorithms. Nevertheless, processing the whole big data set at one time may exceed the available computing resources and the time requirements for some applications. Thus, approximate approaches can be used to achieve asymptotic analysis results, especially when data analysis algorithms are amenable to an approximate result rather than an exact one. However, most approximation approaches require taking a random sample of the data which is a nontrivial task when working with big data sets. In this paper, we employ ensemble learning as an approach for asymptotic analysis using randomly selected subsets (i.e. data blocks) of a big data set. We propose an asymptotic ensemble learning framework which depends on block-based sampling rather than record-based sampling. In order to demonstrate the feasibility and performance of this framework, we present an empirical analysis on real data sets. In addition to the scalability advantage, the experimental results show that several blocks of a data set are enough to get approximately the same results as those from using the whole data set.
Year
DOI
Venue
2016
10.1145/3006299.3006306
Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
Keywords
Field
DocType
Big Data, Distributed and Parallel Processing, Ensemble Learning, Randomness, Asymptotic Analysis
Data modeling,Data mining,Data set,Petabyte,Computer science,Theoretical computer science,Artificial intelligence,Asymptotic analysis,Ensemble learning,Sampling (statistics),Big data,Machine learning,Scalability
Conference
ISBN
Citations 
PageRank 
978-1-5090-4468-9
3
0.44
References 
Authors
12
3
Name
Order
Citations
PageRank
Salman Salloum1151.72
Joshua Zhexue Huang2136582.64
Yu-Lin He3906.31