Abstract | ||
---|---|---|
The discretization of continuous attributes is an important preprocessing step for machine learning and data mining. How to efficiently process the discretization of continuous attributes of massive data has become an urgent problem to be resolved. Hadoop as a rising technique in recent years can efficiently process many applications based on massive data. This paper designs and implements a parallel Chi2-based discretization algorithm based on MapReduce model. On the premise of the discretization efficiency, experiments have been done by using different size of data sets in the different nodes. The experimental results show that the proposed algorithm has high efficiency and good scalability to process the discretization of continuous attributes of massive data. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1007/978-3-319-15554-8_83 | HUMAN CENTERED COMPUTING, HCC 2014 |
Keywords | Field | DocType |
Hadoop, MapReduce, Chi2 algorithm, Large-scale data, Discretization | Discretization,Data set,Computer science,Parallel computing,Algorithm,Scalability | Conference |
Volume | ISSN | Citations |
8944 | 0302-9743 | 1 |
PageRank | References | Authors |
0.36 | 4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yong Zhang | 1 | 14 | 4.98 |
Jingwen Yu | 2 | 1 | 0.36 |
Jianying Wang | 3 | 1 | 0.36 |