Distributed Decision Tree V.2.0 - Citegraph

Paper Info

Title
Distributed Decision Tree V.2.0

Abstract
Decision Tree is a state-of-the-art classification and prediction algorithm in machine learning which constructs tree-structured set of attributes. Its distributed implementation, i.e. Distributed Decision Tree generates a specified number of trees (depending upon number of partitions of input dataset) and at the end collects votes or averages the prediction or classification. Here, the overall idea of achieving parallelism depends upon number of partitions. Parallelism can be achived by proper tuning of number of partitions. However, this kind of setup in-turn leads to a problem of compromise in accuracy, because there is always a tradeoff between accuracy and size of partition. Therefore, in this paper, we have proposed an improved Distributed Decision Tree algorithm to achieve true parallelism without loss in accuracy. The improved Distributed Decision Tree is implemented using open-source distributed frameworks Hadoop and Spark. We measure learning time, size of tree and accuracy to set up benchmarking using medium to large datasets.

Year	DOI	Venue
2017	10.1109/BigData.2017.8258011	2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)
Keywords	DocType	ISSN
distributed decision tree, decision tree, spark, hadoop	Conference	2639-1589
Citations	PageRank	References
0	0.34	0
Authors
2

Authors (2 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Ankit Desai	1	0	0.68
Sanjay Chaudhary	2	223	24.16

1