Title
Spark-parSketch: A Massively Distributed Indexing of Time Series Datasets.
Abstract
A growing number of domains (finance, seismology, internet-of-things, etc.) collect massive time series. When the number of series grow to the hundreds of millions or even billions, similarity queries become intractable on a single machine. Further, naive (quadratic) parallelization won't work well. So, we need both efficient indexing and parallelization. We propose a demonstration of Spark-parSketch, a complete solution based on sketches / random projections to efficiently perform both the parallel indexing of large sets of time series and a similarity search on them. Because our method is approximate, we explore the tradeoff between time and precision. A video showing the dynamics of the demonstration can be found by the link http://parsketch.gforge.inria.fr/video/parSketchdemo_720p.mov.
Year
DOI
Venue
2018
10.1145/3269206.3269226
CIKM
Keywords
Field
DocType
time series, indexing, similarity search, distributed data processing, Spark
Data mining,Spark (mathematics),Computer science,Quadratic equation,Search engine indexing,Nearest neighbor search
Conference
ISBN
Citations 
PageRank 
978-1-4503-6014-2
0
0.34
References 
Authors
10
6
Name
Order
Citations
PageRank
Oleksandra Levchenko161.50
Djamel Edine Yagoubi271.85
Reza Akbarinia325425.77
Florent Masseglia440843.08
Boyan Kolev5385.47
Dennis E. Shasha6176.79