Title
Minimum Spanning Tree Based Classification Model for Massive Data with MapReduce Implementation
Abstract
Rapid growth of data has provided us with more information, yet challenges the tradition techniques to extract the useful knowledge. In this paper, we propose MCMM, a Minimum spanning tree (MST) based Classification model for Massive data with MapReduce implementation. It can be viewed as an intermediate model between the traditional K nearest neighbor method and cluster based classification method, aiming to overcome their disadvantages and cope with large amount of data. Our model is implemented on Hadoop platform, using its MapReduce programming framework, which is particular suitable for cloud computing. We have done experiments on several data sets including real world data from UCI repository and synthetic data, using Downing 4000 clusters, installed with Hadoop. The results show that our model outperforms KNN and some other classification methods on a general basis with respect to accuracy and scalability.
Year
DOI
Venue
2010
10.1109/ICDMW.2010.14
ICDM Workshops
Keywords
Field
DocType
mapreduce programming framework,massive data,classification method,real world data,mapreduce implementation,neighbor method,classification model,minimum spanning tree,intermediate model,synthetic data,hadoop platform,classification,cloud computing,measurement,computational modeling,distributed algorithms,data mining,accuracy,k nearest neighbor,data models
Data modeling,Data mining,Data set,Computer science,Distributed algorithm,Synthetic data,Artificial intelligence,Machine learning,Software framework,Scalability,Cloud computing,Minimum spanning tree
Conference
Citations 
PageRank 
References 
4
0.40
7
Authors
5
Name
Order
Citations
PageRank
Jin Chang140.74
Jun Luo222226.61
Zhexue Huang31169102.63
Shengzhong Feng473350.59
Jianping Fan52677192.33