Title
A new decision tree classification method for mining high-speed data streams based on threaded binary search trees
Abstract
One of most important algorithms for mining data streams is VFDT. It uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed. Gama et al. have extended VFDT in two directions. Their system VFDTc can deal with continuous data and use more powerful classification techniques at tree leaves. In this paper, we revisit this problem and implemented a system VFDTt on top of VFDT and VFDTc. We make the following three contributions: 1) we present a threaded binary search trees (TBST) approach for efficiently handling continuous attributes. It builds a threaded binary search tree, and its processing time for values inserting is O(nlogn), while VFDT's processing time is O(n$sup2$esup). When a new example arrives, VFDTc need update O(logn) attribute tree nodes, but VFDTt just need update one necessary node.2) we improve the method of getting the best split-test point of a given continuous attribute. Comparing to the method used in VFDTc, it improves from O(nlogn) to O (n) in processing time. 3) Comparing to VFDTc, VFDTt's candidate split-test number decrease from O(n) to O(logn). Comparing to VFDT, the most relevant property of our system is an average reduction of 25.53% in processing time, while keep the same tree size and accuracy. Overall, the techniques introduced here significantly improve the efficiency of decision tree classification on data streams.
Year
DOI
Venue
2007
10.1007/978-3-540-77018-3_27
PAKDD Workshops
Keywords
Field
DocType
processing time,new decision tree classification,tree size,threaded binary search tree,decision tree classification,extended vfdt,continuous data,system vfdtc,continuous attribute,high-speed data stream,attribute tree node,data stream,binary search tree,decision tree,data streams
Data mining,Range tree,Tree traversal,Computer science,K-ary tree,Threaded binary tree,Optimal binary search tree,Red–black tree,Search tree,Interval tree
Conference
Volume
ISSN
ISBN
4819
0302-9743
3-540-77016-X
Citations 
PageRank 
References 
4
0.42
22
Authors
5
Name
Order
Citations
PageRank
Tao Wang1333.91
Zhoujun Li2964115.99
Xiaohua Hu32819314.15
Yuejin Yan4253.06
Huo-wang Chen523533.47