Abstract | ||
---|---|---|
A growing number of domains (finance, seismology, internet-of-things, etc.) collect massive time series. When the number of series grow to the hundreds of millions or even billions, similarity queries become intractable on a single machine. Further, naive (quadratic) parallelization won't work well. So, we need both efficient indexing and parallelization. We propose a demonstration of Spark-parSketch, a complete solution based on sketches / random projections to efficiently perform both the parallel indexing of large sets of time series and a similarity search on them. Because our method is approximate, we explore the tradeoff between time and precision. A video showing the dynamics of the demonstration can be found by the link http://parsketch.gforge.inria.fr/video/parSketchdemo_720p.mov.
|
Year | DOI | Venue |
---|---|---|
2018 | 10.1145/3269206.3269226 | CIKM |
Keywords | Field | DocType |
time series, indexing, similarity search, distributed data processing, Spark | Data mining,Spark (mathematics),Computer science,Quadratic equation,Search engine indexing,Nearest neighbor search | Conference |
ISBN | Citations | PageRank |
978-1-4503-6014-2 | 0 | 0.34 |
References | Authors | |
10 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Oleksandra Levchenko | 1 | 6 | 1.50 |
Djamel Edine Yagoubi | 2 | 7 | 1.85 |
Reza Akbarinia | 3 | 254 | 25.77 |
Florent Masseglia | 4 | 408 | 43.08 |
Boyan Kolev | 5 | 38 | 5.47 |
Dennis E. Shasha | 6 | 17 | 6.79 |