Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce - Citegraph

Paper Info

Title
Optimizing the Data-Process Relationship for Fast Mining of Frequent Itemsets in MapReduce

Abstract
Despite crucial recent advances, the problem of frequent itemset mining is still facing major challenges. This is particularly the case when: i the mining process must be massively distributed and; ii the minimum support MinSup is very low. In this paper, we study the effectiveness and leverage of specific data placement strategies for improving parallel frequent itemset mining PFIM performance in MapReduce, a highly distributed computation framework. By offering a clever data placement and an optimal organization of the extraction algorithms, we show that the itemset discovery effectiveness does not only depend on the deployed algorithms. We propose ODPR Optimal Data-Process Relationship, a solution for fast mining of frequent itemsets in MapReduce. Our method allows discovering itemsets from massive datasets, where standard solutions from the literature do not scale. Indeed, in a massively distributed environment, the arrangement of both the data and the different processes can make the global job either completely inoperative or very effective. Our proposal has been evaluated using real-world data sets and the results illustrate a significant scale-up obtained with very low MinSup, which confirms the effectiveness of our approach.

Year	DOI	Venue
2015	10.1007/978-3-319-21024-7_15	Machine Learning and Data Mining in Pattern Recognition
Field	DocType	Volume
Data mining,Data set,Data processing,Distributed Computing Environment,Computer science,Artificial intelligence,Machine learning,Computation	Conference	9166
ISSN	Citations	PageRank
0302-9743	0	0.34
References	Authors
11	3

Authors (3 rows)

Cited by (0 rows)

References (11 rows)

Name	Order	Citations	PageRank
Saber Salah	1	0	0.34
Reza Akbarinia	2	254	25.77
Florent Masseglia	3	408	43.08

1