Title
A Streaming Parallel Decision Tree Algorithm
Abstract
We propose a new algorithm for building decision tree classifiers. The algorithm is executed in a distributed environment and is especially designed for classifying large data sets and streaming data. It is empirically shown to be as accurate as a standard decision tree classifier, while being scalable for processing of streaming data on multiple processors. These findings are supported by a rigorous analysis of the algorithm's accuracy. The essence of the algorithm is to quickly construct histograms at the processors, which compress the data to a fixed amount of memory. A master processor uses this information to find near-optimal split points to terminal tree nodes. Our analysis shows that guarantees on the local accuracy of split points imply guarantees on the overall tree accuracy.
Year
DOI
Venue
2010
10.1145/1756006.1756034
Journal of Machine Learning Research
Keywords
Field
DocType
scalability,standard decision tree classifier,local accuracy,streaming parallel decision tree,split point,near-optimal split point,new algorithm,decision tree classifier,overall tree accuracy,rigorous analysis,decision tree classifiers,large data set,streami ng data,distributed computing,terminal tree node,decision tree,distributed environment
Data mining,Tree traversal,Computer science,C4.5 algorithm,Artificial intelligence,Segment tree,ID3 algorithm,Fractal tree index,Machine learning,Interval tree,Incremental decision tree,Search tree
Journal
Volume
ISSN
Citations 
11,
1532-4435
54
PageRank 
References 
Authors
1.77
20
2
Name
Order
Citations
PageRank
Y. Ben-Haim11408.29
Elad Tom-Tov2541.77