Title
Two Novel Adaptive Symbolic Representations for Similarity Search in Time Series Databases
Abstract
Since the last decade, we have seen an increasing level of interest in time series data mining due to its variety of real-world applications. Numerous representation models of time series have been proposed for data mining, including piecewise polynomial models, spectral models, and the recently proposed symbolic models, such as Symbolic Aggregate approXimation (SAX) and its multiresolution extension, indexable Symbolic Aggregate approXimation (iSAX). In spite of many advantages of dimensionality/numerosity reduction, and lower bounding distance measures, the quality of SAX approximation is highly dependent on the Gaussian distributed property of time series, especially in reduced-dimensionality literature. In this paper, we introduce a novel adaptive symbolic approach based on the combination of SAX and k卢-means algorithm which we call adaptive SAX (aSAX). The proposed representation greatly outperforms the classic SAX not only on the highly Gaussian distribution datasets, but also on the lack of Gaussian distribution datasets with a variety of dimensionality reduction. In addition to being competitive with, or superior to, the classic SAX, we extend aSAX to the multiresolution symbolic representation called indexable adaptive SAX (iaSAX). Our empirical experiments with real-world time series datasets confirm the theoretical analyses as well as the efficiency of the two proposed algorithms in terms of the tightness of lower bound, pruning power and number of random disk accesses.
Year
DOI
Venue
2010
10.1109/APWeb.2010.23
Web Conference
Keywords
Field
DocType
adaptive sax,real-world time series datasets,time series,classic sax,novel adaptive symbolic representations,indexable adaptive,sax approximation,similarity search,time series data,gaussian distribution datasets,proposed algorithm,time series databases,proposed representation,gaussian distribution,data mining,indexation,databases,k means algorithm,chebyshev approximation,lower bound,indexing,polynomials,time series analysis,database management systems,time measurement
k-means clustering,Data mining,Dimensionality reduction,Computer science,Approximation theory,Curse of dimensionality,Gaussian,Time series database,Nearest neighbor search,Piecewise,Database
Conference
ISBN
Citations 
PageRank 
978-1-4244-6600-9
11
0.65
References 
Authors
10
3
Name
Order
Citations
PageRank
Ninh Pham11697.68
Quang Loc Le2659.48
Tran Khanh Dang319040.04