Title
A Bayes Classifier-Based OVFDT Algorithm for Massive Stream Data Mining on Big Data Platform.
Abstract
Recently, online incremental data mining has become an immensely growing area of research for stream data mining. VFDT algorithm, as an excellent incremental decision tree classification algorithm, is widely used in online data mining. To optimize VFDT algorithm, a dynamic tie-breaking threshold strategy and a pre-pruning mechanism strategy are utilized to achieve the reduction of the scale of decision tree. Furthermore, Bayes classifier is applied to leaf nodes of Hoeffding decision tree, which promotes the improvement of classification accuracy. In this paper, this improved algorithm is called OVFDT (Optimized VFDT) algorithm. To improve the performance of OVFDT for massive streaming data processing, an implementation scheme of OVFDT Algorithm on MapReduce Platform is proposed in our paper. Considering the need for real-time computing, the implementation scheme on Storm Platform is designed. Three comparison experiments are designed to compare the scale, the classification accuracy and the execution time of decision tree of three algorithm generate. The simulation results reveal that compared with C4.5 and VFDT algorithm, OVFDT algorithm can effectively reduce the scale of the decision tree, achieves the improvement of classification accuracy as well.
Year
DOI
Venue
2017
10.1007/978-3-319-61566-0_49
COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS, CISIS-2017
Field
DocType
Volume
Data mining,Decision tree,Data stream mining,Computer science,Algorithm,Stream data,Streaming data,Execution time,Big data,Bayes classifier,Incremental decision tree
Conference
611
ISSN
Citations 
PageRank 
2194-5357
0
0.34
References 
Authors
12
4
Name
Order
Citations
PageRank
Liangde Li100.68
Peng Li2254.13
he xu33622.25
Fangzhou Chen401.01