Title
KV-match: An Efficient Subsequence Matching Approach for Large Scale Time Series.
Abstract
Time series data have exploded due to the popularity of new applications, like data center management and IoT. Time series data management system (TSDB), emerges to store and query the large volume of time series data. Subsequence matching is critical in many time series mining algorithms, and extensive approaches have been proposed. However, the shift of distributed storage system and the performance gap make these approaches not compatible with TSDB. To fill this gap, we propose a new index structure, KV-index, and the corresponding matching algorithm, KV-match. KV-index is a file-based structure, which can be easily implemented on local files, HDFS or HBase tables. KV-match algorithm probes the index efficiently with a few sequential scans. Moreover, two optimization techniques, window reduction and window reordering, are proposed to further accelerate the processing. To support the query of arbitrary lengths, we extend KV-match to KV-match$_{DP}$, which utilizes multiple varied length indexes to process the query simultaneously. A two-dimensional dynamic programming algorithm is proposed to find the optimal query segmentation. We implement our approach on both local files and HBase tables, and conduct extensive experiments on synthetic and real-world datasets. Results show that our index is of comparable size to the popular tree-style index while our query processing is order of magnitudes more efficient.
Year
Venue
Field
2017
arXiv: Databases
Dynamic programming,Time series,Longest common subsequence problem,Longest increasing subsequence,Segmentation,Computer science,Distributed data store,Algorithm,Subsequence,Blossom algorithm
DocType
Volume
Citations 
Journal
abs/1710.00560
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Jiaye Wu142.10
Peng Wang226121.35
Chen Wang336193.70
Wei Wang47122746.33
Jianmin Wang52446156.05