Title
Parallel Implementation Of Chi2 Algorithm In Mapreduce Framework
Abstract
The discretization of continuous attributes is an important preprocessing step for machine learning and data mining. How to efficiently process the discretization of continuous attributes of massive data has become an urgent problem to be resolved. Hadoop as a rising technique in recent years can efficiently process many applications based on massive data. This paper designs and implements a parallel Chi2-based discretization algorithm based on MapReduce model. On the premise of the discretization efficiency, experiments have been done by using different size of data sets in the different nodes. The experimental results show that the proposed algorithm has high efficiency and good scalability to process the discretization of continuous attributes of massive data.
Year
DOI
Venue
2014
10.1007/978-3-319-15554-8_83
HUMAN CENTERED COMPUTING, HCC 2014
Keywords
Field
DocType
Hadoop, MapReduce, Chi2 algorithm, Large-scale data, Discretization
Discretization,Data set,Computer science,Parallel computing,Algorithm,Scalability
Conference
Volume
ISSN
Citations 
8944
0302-9743
1
PageRank 
References 
Authors
0.36
4
3
Name
Order
Citations
PageRank
Yong Zhang1144.98
Jingwen Yu210.36
Jianying Wang310.36