Generating data series query workloads. - Citegraph

Paper Info

Title
Generating data series query workloads.

Abstract
Data series (including time series) has attracted lots of interest in recent years. Most of the research has focused on how to efficiently support similarity or nearest neighbor queries over large data series collections (an important data mining task), and several data series summarization and indexing methods have been proposed in order to solve this problem. Up to this point, very little attention has been paid to properly evaluating such index structures, with most previous works relying solely on randomly selected data series to use as queries. In this work, we show that random workloads are inherently not suitable for the task at hand and we argue that there is a need for carefully generating query workloads. We define measures that capture the characteristics of queries, and we propose a method for generating workloads with the desired properties, that is, effectively evaluating and comparing data series summarizations and indexes. In our experimental evaluation, with carefully controlled query workloads, we shed light on key factors affecting the performance of nearest neighbor search in large data series collections. This is the first paper that introduces a method for quantifying hardness of data series queries, as well as the ability to generate queries of predefined hardness.

Year	DOI	Venue
2018	10.1007/s00778-018-0513-x	VLDB J.
Keywords	Field	DocType
Time series, Data series, Similarity search, Indexing, Query workload generation	k-nearest neighbors algorithm,Data mining,Automatic summarization,Computer science,Search engine indexing,Data series,Nearest neighbor search	Journal
Volume	Issue	ISSN
27	6	1066-8888
Citations	PageRank	References
3	0.38	31
Authors
5

Authors (5 rows)

Cited by (3 rows)

References (31 rows)

Name	Order	Citations	PageRank
Kostas Zoumpatianos	1	88	8.08
Yin Lou	2	506	28.82
Ioana Ileana	3	21	2.58
Themis Palpanas	4	1136	91.61
Johannes Gehrke	5	13362	1055.06

1