Title
Stream-join revisited in the context of epoch-based SQL continuous query
Abstract
The current generation of stream processing systems is in general built separately from the query engine thus lacks the expressive power of SQL and causes significant overhead in data access and movement. This situation has motivated us to leverage the query engine for stream processing. Stream-join is a window operation where the key issue is how to punctuate and pair two or more correlated streams. In this work we tackle this issue in the specific context of query engine supported stream processing. We focus on the following problems: a SQL query is definable on bounded relation data but stream data are unbounded, and join multiple streams is a stateful (thus history-sensitive) operation but a SQL query only cares about the current state; further, relation join typically requires relation re-scan in a nested-loop but by nature a stream cannot be re-captured as reading a stream always gets newly incoming data. To leverage query processing for analyzing unbounded stream, we defined the Epoch-based Continuous Query (ECQ) model which allows a SQL query to be executed epoch by epoch for processing the stream data chunk by chunk. However, unlike multiple one-time queries, an ECQ is a single, continuous query instance across execution epochs for keeping the continuity of the application state as required by the history-sensitive operations such as sliding-window join. To joining multiple streams, we further developed the techniques to cache one or more consecutive data chunks falling in a sliding window across query execution epochs in the ECQ instance, to allow them to be re-delivered from the cache. In this way join multiple streams and self-join a single stream in the data chunk based window or sliding window, with various pairing schemes, are made possible. We extended the PostgreSQL engine to support the proposed approach. Our experience has demonstrated its value.
Year
DOI
Venue
2012
10.1145/2351476.2351491
IDEAS
Keywords
Field
DocType
continuous query instance,multiple one-time query,stream processing,multiple stream,epoch-based sql continuous query,single stream,query engine,correlated stream,query execution epoch,sql query,query processing,data access,nested loops,relational data,sliding window,expressive power
Query optimization,SQL,Data mining,Query expansion,Computer science,Sargable,Web query classification,Query by Example,Spatial query,Database,Boolean conjunctive query
Conference
Citations 
PageRank 
References 
0
0.34
9
Authors
2
Name
Order
Citations
PageRank
Qiming Chen12010233.16
Meichun Hsu23437778.34