Title
Scalable Filtering of Multiple Generalized-Tree-Pattern Queries over XML Streams
Abstract
An XML publish/subscribe system needs to filter a large number of queries over XML streams. Most existing systems only consider filtering the simple XPath statements. In this paper, we focus on filtering of the more complex Generalized-Tree-Pattern (GTP) queries. Our filtering mechanism is based on a novel Tree-of-Path (TOP) encoding scheme, which compactly represents the path matches for the entire document. First, we show that the TOP encodings can be efficiently produced via a shared bottom-up path matching. Second, with the aid of this TOP encoding, we can 1) achieve polynomial time and space complexity for post processing, 2) avoid redundant predicate evaluations, 3) allow an efficient duplicate-free and merge join-based algorithm for merging multiple encoded path matches and 4) simplify the processing of GTP queries. Overall our approach maximizes the sharing opportunity across queries by exploiting the suffix as well as prefix sharing. At the same time, our TOP encodings allow efficient post processing for GTP queries. Extensive performance studies show that our GFilter solution not only achieves significantly better filtering performance than state-of-the-art algorithms, but also is capable of efficiently filtering the more complex GTP queries.
Year
DOI
Venue
2008
10.1109/TKDE.2008.83
IEEE Trans. Knowl. Data Eng.
Keywords
Field
DocType
gtp query,top encoding,xml stream,xml streams,top encodings,efficient post processing,post processing,complex generalized-tree-pattern,multiple generalized-tree-pattern queries,multiple encoded path match,complex gtp query,shared bottom-up path matching,scalable filtering,matched filters,merging,web services,computational complexity,pattern matching,xml,middleware,scalability,bottom up,database languages,polynomials,polynomial time,space complexity,tree data structures,encoding
Data mining,Query language,XML,Computer science,Tree (data structure),Filter (signal processing),Sort-merge join,Theoretical computer science,XPath,Time complexity,Pattern matching
Journal
Volume
Issue
ISSN
20
12
1041-4347
Citations 
PageRank 
References 
10
0.52
18
Authors
6
Name
Order
Citations
PageRank
Songting Chen139819.91
Hua-Gang Li220111.75
Jun'ichi Tatemura310012.51
Wang-Pin Hsiung447325.90
Divyakant Agrawal582011674.75
K. Selçuk Candan62218307.06