Mind the gap: large-scale frequent sequence mining - Citegraph

Paper Info

Title
Mind the gap: large-scale frequent sequence mining

Abstract
Frequent sequence mining is one of the fundamental building blocks in data mining. While the problem has been extensively studied, few of the available techniques are sufficiently scalable to handle datasets with billions of sequences; such large-scale datasets arise, for instance, in text mining and session analysis. In this paper, we propose MG-FSM, a scalable algorithm for frequent sequence mining on MapReduce. MG-FSM can handle so-called \"gap constraints\", which can be used to limit the output to a controlled set of frequent sequences. At its heart, MG-FSM partitions the input database in a way that allows us to mine each partition independently using any existing frequent sequence mining algorithm. We introduce the notion of w-equivalency, which is a generalization of the notion of a \"projected database\" used by many frequent pattern mining algorithms. We also present a number of optimization techniques that minimize partition size, and therefore computational and communication costs, while still maintaining correctness. Our experimental study in the context of text mining suggests that MG-FSM is significantly more efficient and scalable than alternative approaches.

Year	DOI	Venue
2013	10.1145/2463676.2465285	SIGMOD Conference
Keywords	Field	DocType
partition size,scalable algorithm,large-scale frequent sequence mining,frequent sequence mining,input database,frequent pattern mining algorithm,data mining,frequent sequence,large-scale datasets,existing frequent sequence mining,text mining	Data mining,Concept mining,Data stream mining,Text mining,GSP Algorithm,Computer science,Correctness,Molecule mining,Partition (number theory),Database,Scalability	Conference
Citations	PageRank	References
21	0.70	27
Authors
4

Authors (4 rows)

Cited by (21 rows)

References (27 rows)

Name	Order	Citations	PageRank
Iris Miliaraki	1	237	10.40
Klaus Berberich	2	1271	68.96
Rainer Gemulla	3	1081	53.54
Spyros Zoupanos	4	48	4.42

1