Abstract | ||
---|---|---|
Traditional parallel algorithms for mining frequent itemsets aim to balance load by equally partitioning data among a group of computing nodes. We start this study by discovering a serious performance problem of the existing parallel Frequent Itemset Mining algorithms. Given a large dataset, data partitioning strategies in the existing solutions suffer high communication and mining overhead induced by redundant transactions transmitted among computing nodes. We address this problem by developing a data partitioning approach called FiDoop-DP using the MapReduce programming model. The overarching goal of FiDoop-DP is to boost the performance of parallel Frequent Itemset Mining on Hadoop clusters. At the heart of FiDoop-DP is the Voronoi diagram-based data partitioning technique, which exploits correlations among transactions. Incorporating the similarity metric and the Locality-Sensitive Hashing technique, FiDoop-DP places highly similar transactions into a data partition to improve locality without creating an excessive number of redundant transactions. We implement FiDoop-DP on a 24-node Hadoop cluster, driven by a wide range of datasets created by IBM Quest Market-Basket Synthetic Data Generator. Experimental results reveal that FiDoop-DP is conducive to reducing network and computing loads by the virtue of eliminating redundant transactions on Hadoop nodes. FiDoop-DP significantly improves the performance of the existing parallel frequent-pattern scheme by up to 31 percent with an average of 18 percent. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/TPDS.2016.2560176 | IEEE Trans. Parallel Distrib. Syst. |
Keywords | Field | DocType |
Data mining,Itemsets,Partitioning algorithms,Programming,Computational modeling,Distributed databases,Correlation | Data mining,Locality,Programming paradigm,Parallel algorithm,Computer science,Exploit,Synthetic data,Hash function,Voronoi diagram,Distributed database,Distributed computing | Journal |
Volume | Issue | ISSN |
28 | 1 | 1045-9219 |
Citations | PageRank | References |
8 | 0.47 | 21 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
yaling xun | 1 | 16 | 3.31 |
Jifu Zhang | 2 | 95 | 19.42 |
Xiao Qin | 3 | 1836 | 125.69 |
Xujun Zhao | 4 | 30 | 4.31 |