Title
RRSi: indexing XML data for proximity twig queries
Abstract
Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
Year
DOI
Venue
2008
10.1007/s10115-008-0122-x
Knowl. Inf. Syst.
Keywords
Field
DocType
proximity twig query,indexing xml document,xml indexing · proximity query · twig query · xml structural similarity,twig query pattern matching,indexing xml data,user query,xml query processing,twig query processing,query processing,xml document,structure-based query lookup,proximate query answer,pattern matching,indexation,structural similarity,information retrieval,computational complexity
Data mining,XML Encryption,Efficient XML Interchange,Streaming XML,Information retrieval,Query expansion,Computer science,XML validation,Document Structure Description,XML database,XML Schema Editor
Journal
Volume
Issue
ISSN
17
2
0219-3116
Citations 
PageRank 
References 
6
0.53
23
Authors
2
Name
Order
Citations
PageRank
Patrick K. L. Ng1141.11
Vincent T. Y. Ng2504122.85