Title
Storing and indexing XML documents upside down
Abstract
XML documents contain substantial redundancy in their structure part, because each path from the root node to a leaf node is explicitly represented and typically large sets of such path instances belong to a path class, i.e., the nodes of the path instances are labeled by the same sequence of element (or attribute) names. To save storage space and I/O cost, we want to get rid of this structural redundancy to the extent possible. While all known methods for the physical representation (storage) of XML documents proceed from the root via the element/attribute hierarchy (internal nodes) down to the leaves (values), we follow an upside-down approach which explicitly stores the values and only reconstructs the internal nodes, if needed. The cornerstones for such a solution are suitable node labels and a path synopsis which efficiently represents all path classes of an XML document. As a solution, we propose a compact internal storage format for native XML database systems where the inner structure of the stored documents is virtualized. Because this elementless storage format provides an efficient reconstruction of a document using its path synopsis, all processing properties are preserved and the semantics of navigational and declarative operations of XML languages remains unchanged. Adjusted indexes support the full spectrum of so-called content-and-structure single path queries. Apart from greatly reduced storage consumption, our approach demonstrates its superiority, compared to competing methods, not only for a substantial fraction of those queries, but also for storing, reconstructing, and navigating XML documents.
Year
DOI
Venue
2009
10.1007/s00450-009-0056-x
Computer Science - R&D
Keywords
Field
DocType
storage formats · xml indexes · native xml database management systems · elementless xml storage · path synopsis · prefix-based node labeling cr subject classification e.2,h.2.4,h.2.2,xml document,management system,indexation,spectrum
XML Encryption,Efficient XML Interchange,Information retrieval,XML validation,Computer science,Parallel computing,Document Structure Description,XML database,Root element,XML schema,Database,XML Catalog
Journal
Volume
Issue
ISSN
24
1-2
1865-2042
Citations 
PageRank 
References 
6
0.48
32
Authors
3
Name
Order
Citations
PageRank
Christian Mathis114710.87
Theo Härder21132307.12
Karsten Schmidt3254.82