Title
XPathLearner: an on-line self-tuning Markov histogram for XML path selectivity estimation
Abstract
The extensible mark-up language (XML) is gaining widespread use as a format for data exchange and storage on the World Wide Web. Queries over XML data require accurate selectivity estimation of path expressions to optimize query execution plans. Selectivity estimation of XML path expression is usually done based on summary statistics about the structure of the underlying XML repository. All previous methods require an off-line scan of the XML repository to collect the statistics. In this paper, we propose XPathLearner, a method for estimating selectivity of the most commonly used types of path expressions without looking at the XML data. XPathLearner gathers and refines the statistics using query feedback in an on-line manner and is especially suited to queries in Internet scale applications since the underlying XML repository is either inaccessible or too large to be scanned in its entirety. Besides the on-line property, our method also has two other novel features: (a) XPathLearner is workload-aware in collecting the statistics and thus can be more accurate than the more costly off-line method under tight memory constraints, and (b) XPathLearner automatically adjusts the statistics using query feedback when the underlying XML data change. We show empirically the estimation accuracy of our method using several real data sets.
Year
Venue
Keywords
2002
VLDB
xml path selectivity estimation,xml repository,on-line self-tuning markov histogram,costly off-line method,path expression,xml path expression,xml data,underlying xml repository,underlying xml data change,query feedback,data exchange,world wide web
Field
DocType
Citations 
Data mining,XML framework,Efficient XML Interchange,Streaming XML,XML,Information retrieval,XML Schema (W3C),XML validation,Computer science,XML database,Simple API for XML,Database
Conference
49
PageRank 
References 
Authors
1.89
10
5
Name
Order
Citations
PageRank
Lipyeow Lim138435.36
Min WANG21662192.58
Sriram Padmanabhan346662.40
Jeffrey Scott Vitter466281246.72
Ronald Parr52428186.85