Title
Using a relational database for scalable XML search
Abstract
XML is a flexible and powerful tool that enables information and security sharing in heterogeneous environments. Scalable technologies are needed to effectively manage the growing volumes of XML data. A wide variety of methods exist for storing and searching XML data; the two most common techniques are conventional tree-based and relational approaches. Tree-based approaches represent XML as a tree and use indexes and path join algorithms to process queries. In contrast, the relational approach utilizes the power of a mature relational database to store and search XML. This method relationally maps XML queries to SQL and reconstructs the XML from the database results. To date, the limited acceptance of the relational approach to XML processing is due to the need to redesign the relational schema each time a new XML hierarchy is defined. We, in contrast, describe a relational approach that is fixed schema eliminating the need for schema redesign at the expense of potentially longer runtimes. We show, however, that these potentially longer runtimes are still significantly shorter than those of the tree approach. We use a popular XML benchmark to compare the scalability of both approaches. We generated large collections of heterogeneous XML documents ranging in size from 500 MB to 8 GB using the XBench benchmark. The scalability of each method was measured by running XML queries that cover a wide range of XML search features on each collection. We measure the scalability of each method over different query features as the collection size increases. In addition, we examine the performance of each method as the result size and the number of predicates increase. Our results show that our relational approach provides a scalable approach to XML retrieval by leveraging existing relational database optimizations. Furthermore, we show that the relational approach typically outperforms the tree-based approach while scaling consistently over all collections studied.
Year
DOI
Venue
2008
10.1007/s11227-007-0153-1
The Journal of Supercomputing
Keywords
Field
DocType
XML retrieval,Relational database
Efficient XML Interchange,XML Encryption,Streaming XML,Computer science,XML validation,Document Structure Description,XML database,XML schema,Database,XML Schema Editor
Journal
Volume
Issue
ISSN
44
2
0920-8542
Citations 
PageRank 
References 
5
0.49
44
Authors
5
Name
Order
Citations
PageRank
Rebecca J. Cathey181.04
Steven M. Beitzel269646.72
Eric C. Jensen369646.72
David Grossman452534.73
Ophir Frieder53300419.55