Title
Improved algorithm for parallel mining collaborative frequent itemsets in multiple data streams
Abstract
With the rapid development of the World Wide Web technology, complex and diverse data present explosive growth, so frequent itemset mining plays an essential role. In view of the mining frequent itemsets in multiple data streams by limited computing power of a single processor, an improved algorithm of Parallel Mining Collaborative frequent itemsets in multiple data streams (PMCMD-Stream) was proposed. Firstly, the algorithm compresses the potential and frequent itemsets into CP-Tree only by one-scan and applies increment method to inserting or deleting related branch on CP-Tree, we do not need to repeatedly scanning the databases to generate many candidate frequent itemsets and save the running time. Secondly, this parallelized algorithm can be run in the MapReduce programming environment. Finally, the valuable frequent itemsets, namely global collaborative frequent itemsets, were obtained. Because each candidate frequent itemset is independent, and different candidate frequent itemsets can be processed by multiple computing machines concurrently. The experimental results show that PMCMD-Stream algorithm not only can improve the mining efficiency but also have much better scalability than the existing algorithms, so as to discover the collaborative frequent itemsets from large-scale data streams.
Year
DOI
Venue
2019
10.1007/s10586-018-1859-y
Cluster Computing
Keywords
DocType
Volume
Stream data mining, Multiple data streams, Parallel algorithm, Sliding window, Frequent itemsets, Collaborative frequent itemsets
Journal
22
Issue
ISSN
Citations 
Supplement
1573-7543
0
PageRank 
References 
Authors
0.34
17
3
Name
Order
Citations
PageRank
FangAi Liu13913.94
Qianqian Wang200.34
Xin Wang3018.25