Title
Mining neighbor-based patterns in data streams
Abstract
Discovery of complex patterns such as clusters, outliers, and associations from huge volumes of streaming data has been recognized as critical for many application domains. However, little research effort has been made toward detecting patterns within sliding window semantics as required by real-time monitoring tasks, ranging from real time traffic monitoring to stock trend analysis. Applying static pattern detection algorithms from scratch to every window is impractical due to their high algorithmic complexity and the real-time responsiveness required by streaming applications. In this work, we develop methods for the incremental detection of neighbor-based patterns, in particular, density-based clusters and distance-based outliers over sliding stream windows. Incremental computation for pattern detection queries is challenging. This is because purging of to-be-expired data from previously formed patterns may cause birth, shrinkage, splitting or termination of these complex patterns. To overcome this, we exploit the ''predictability'' property of sliding windows to elegantly discount the effect of expired objects with little maintenance cost. Our solution achieves guaranteed minimal CPU consumption, while keeping the memory utilization linear in the number of objects in the window. To thoroughly analyze the performance of our proposed methods, we develop a cost model characterizing the performance of our proposed neighbor-based pattern mining strategies. We conduct an analysis study to not only identify the key performance factors for each strategy but also show under which conditions each of them are most efficient. Our comprehensive experimental study, using both synthetic and real data from domains of moving object monitoring and stock trades, demonstrates superiority of our proposed strategies over alternate methods in both CPU processing resources and in memory utilization.
Year
DOI
Venue
2013
10.1016/j.is.2012.08.001
Inf. Syst.
Keywords
Field
DocType
memory utilization,neighbor-based pattern,incremental detection,proposed strategy,object monitoring,complex pattern,key performance factor,data stream,pattern detection query,proposed neighbor-based pattern mining,algorithms,clusters,outliers
Data mining,Predictability,Data stream mining,Computer science,Ranging,Artificial intelligence,Computation,Sliding window protocol,Outlier,Exploit,Semantics,Database,Machine learning
Journal
Volume
Issue
ISSN
38
3
0306-4379
Citations 
PageRank 
References 
5
0.40
23
Authors
3
Name
Order
Citations
PageRank
Di Yang11529.72
Elke A. Rundensteiner24076700.65
Matthew O. Ward31757189.48