Title
Data Partitioning for Fast Mining of Frequent Itemsets in Massively Distributed Environments.
Abstract
Frequent itemset mining FIM is one of the fundamental cornerstones in data mining. While, the problem of FIM has been thoroughly studied, few of both standard and improved solutions scale. This is mainly the case when i the amount of data tends to be very large and/or ii the minimum support MinSup threshold is very low. In this paper, we propose a highly scalable, parallel frequent itemset mining PFIM algorithm, namely Parallel Absolute Top Down PATD. PATD algorithm renders the mining process of very large databases up﾿to Terabytes of data simple and compact. Its mining process is made up of only one parallel job, which dramatically reduces the mining runtime, the communication cost and the energy power consumption overhead, in a distributed computational platform. Based on a clever and efficient data partitioning strategy, namely Item Based Data Partitioning IBDP, PATD algorithm mines each data partition independently, relying on an absolute minimum support AMinSup instead of a relative one. PATD has been extensively evaluated using real-world data sets. Our experimental results suggest that PATD algorithm is significantly more efficient and scalable than alternative approaches.
Year
DOI
Venue
2015
10.1007/978-3-319-22849-5_21
DEXA
Field
DocType
Citations 
Data mining,Data set,Computer science,Terabyte,Top-down and bottom-up design,Big data,Data partitioning,Database,Scalability,Power consumption
Conference
1
PageRank 
References 
Authors
0.36
10
3
Name
Order
Citations
PageRank
Saber Salah110.36
Reza Akbarinia225425.77
Florent Masseglia340843.08