Title
Tree Pattern Aggregation for Scalable XML Data Dissemination
Abstract
With the rapid growth of XML-document traffic on the Internet, scalable content-based dissemination of XML documents to a large, dynamic group of consumers has become an important research challenge. To indicate the type of content that they are interested in, data consumers typically specify their subscriptions using some XML pattern specification language (e.g., XPath). Given the large volume of subscribers, system scalabil- ity and efficiency mandate the ability to aggregate the set of consumer subscriptions to a smaller set of con- tent specifications, so as to both reduce their storage- space requirements as well as speed up the document- subscription matching process. In this paper, we pro- vide the first systematic study of subscription aggre- gation where subscriptions are specified with tree pat- terns (an important subclass of XPath expressions). The main challenge is to aggregate an input set of tree pat- terns into a smaller set of generalized tree patterns such that: (1) a given space constraint on the total size of the subscriptions is met, and (2) the loss in precision (due to aggregation) during document filtering is minimized. We propose an efficient tree-pattern aggregation algo- rithm that makes effective use of document-distribution statistics in order to compute a precise set of aggregate tree patterns within the allotted space budget. As part of our solution, we also develop several novel algo- rithms for tree-pattern containment and minimization, as well as "least-upper-bound" computation for a set of tree patterns. These results are of interest in their own right, and can prove useful in other domains, such as XML query optimization. Extensive results from a pro- totype implementation validate our approach.
Year
Venue
Keywords
2002
VLDB
xml query optimization,tree pattern aggregation,aggregate tree pattern,efficient tree-pattern aggregation algorithm,precise set,tree pattern,xml pattern specification language,generalized tree pattern,xml document,scalable xml data dissemination,subscription aggregation,smaller set,upper bound,query optimization,data dissemination,specification language
Field
DocType
Citations 
Specification language,Query optimization,Data mining,XML,Expression (mathematics),Computer science,XML validation,Theoretical computer science,XPath,Database,Scalability,Speedup
Conference
39
PageRank 
References 
Authors
5.70
10
5
Name
Order
Citations
PageRank
Chee Yong Chan1643199.24
Wenfei Fan24154197.29
Pascal Felber32432178.76
Minos Garofalakis44904664.22
Rajeev Rastogi56151827.22