Spark-parSketch: A Massively Distributed Indexing of Time Series Datasets. - Citegraph

Paper Info

Title
Spark-parSketch: A Massively Distributed Indexing of Time Series Datasets.

Abstract
A growing number of domains (finance, seismology, internet-of-things, etc.) collect massive time series. When the number of series grow to the hundreds of millions or even billions, similarity queries become intractable on a single machine. Further, naive (quadratic) parallelization won't work well. So, we need both efficient indexing and parallelization. We propose a demonstration of Spark-parSketch, a complete solution based on sketches / random projections to efficiently perform both the parallel indexing of large sets of time series and a similarity search on them. Because our method is approximate, we explore the tradeoff between time and precision. A video showing the dynamics of the demonstration can be found by the link http://parsketch.gforge.inria.fr/video/parSketchdemo_720p.mov.

Year	DOI	Venue
2018	10.1145/3269206.3269226	CIKM
Keywords	Field	DocType
time series, indexing, similarity search, distributed data processing, Spark	Data mining,Spark (mathematics),Computer science,Quadratic equation,Search engine indexing,Nearest neighbor search	Conference
ISBN	Citations	PageRank
978-1-4503-6014-2	0	0.34
References	Authors
10	6

Authors (6 rows)

Cited by (0 rows)

References (10 rows)

Name	Order	Citations	PageRank
Oleksandra Levchenko	1	6	1.50
Djamel Edine Yagoubi	2	7	1.85
Reza Akbarinia	3	254	25.77
Florent Masseglia	4	408	43.08
Boyan Kolev	5	38	5.47
Dennis E. Shasha	6	17	6.79

1