Approximate Processing of Massive Continuous Quantile Queries over High-Speed Data Streams - Citegraph

Paper Info

Title
Approximate Processing of Massive Continuous Quantile Queries over High-Speed Data Streams

Abstract
Quantile computation has many applications including data mining and financial data analysis. It has been shown that an \epsilon{\hbox{-}}{\rm{approximate}} summary can be maintained so that, given a quantile query (\phi, \epsilon), the data item at rank \lceil \phi N \rceil may be approximately obtained within the rank error precision \epsilon N over all N data items in a data stream or in a sliding window. However, scalable online processing of massive continuous quantile queries with different \phi and \epsilon poses a new challenge because the summary is continuously updated with new arrivals of data items. In this paper, first we aim to dramatically reduce the number of distinct query results by grouping a set of different queries into a cluster so that they can be processed virtually as a single query while the precision requirements from users can be retained. Second, we aim to minimize the total query processing costs. Efficient algorithms are developed to minimize the total number of times for reprocessing clusters and to produce the minimum number of clusters, respectively. The techniques are extended to maintain near-optimal clustering when queries are registered and removed in an arbitrary fashion against whole data streams or sliding windows. In addition to theoretical analysis, our performance study indicates that the proposed techniques are indeed scalable with respect to the number of input queries as well as the number of items and the item arrival rate in a data stream.

Year	DOI	Venue
2006	10.1109/TKDE.2006.73	IEEE Trans. Knowl. Data Eng.
Keywords	Field	DocType
data mining,data item,massive continuous quantile queries,total number,financial data analysis,minimum number,high-speed data streams,n data item,data mining.,online computation,approximate processing,whole data stream,distinct query result,index terms—query processing,different query,data stream,sliding window,clustering algorithms,application software,computational complexity,computer science,cost efficiency,data analysis,computer applications,indexing terms	Data mining,Data stream mining,Data stream,Computer science,Artificial intelligence,Cluster analysis,Sliding window protocol,Algorithm,Quantile,Information extraction,Machine learning,Computational complexity theory,Scalability	Journal
Volume	Issue	ISSN
18	5	1041-4347
Citations	PageRank	References
8	0.59	29
Authors
7

Authors (7 rows)

Cited by (8 rows)

References (29 rows)

Name	Order	Citations	PageRank
Xuemin Lin	1	5585	307.32
Jian Xu	2	56	3.44
Qing Zhang	3	567	25.85
Hongjun LU	4	3580	752.13
Jeffrey Xu Yu	5	7018	464.96
Xiaofang Zhou	6	5381	342.70
Yidong Yuan	7	904	30.59

1