Title
MESSIAH: missing element-conscious SLCA nodes search in XML data
Abstract
Keyword search for smallest lowest common ancestors (SLCAs) in XML data has been widely accepted as a meaningful way to identify matching nodes where their subtrees contain an input set of keywords. Although SLCA and its variants (e.g.,MLCA) perform admirably in identifying matching nodes, surprisingly, they perform poorly for searches on irregular schemas that have missing elements, that is, (sub)elements that are optional, or appear in some instances of an element type but not all (e.g., a "population" subelement in a "city" element might be optional, appearing when the population is known and absent when the population is unknown). In this paper, we generalize the SLCA search paradigm to support queries involving missing elements. Specifically, we propose a novel property called optionality resilience that specifies the desired behaviors of an XML keyword search (XKS) approach for queries involving missing elements. We present two variants of a novel algorithm called MESSIAH (Missing Element-conSciouS hIgh-quality SLCA searcH), which are optionality resilient to irregular documents. MESSIAH logically transforms an XML document to a minimal full document where all missing elements are represented as empty elements, i.e., the irregular schema is made "regular", and then employs efficient strategies to identify partial and complete full SLCA nodes (SLCA nodes in the full document) from it. Specifically, it generates the same SLCA nodes as any state-of-the-art approach when the query does not involve missing elements but avoids irrelevant results when missing elements are involved. Our experimental study demonstrates the ability of MESSIAH to produce superior quality search results.
Year
DOI
Venue
2013
10.1145/2463676.2463699
SIGMOD Conference
Keywords
Field
DocType
complete full slca node,irregular schema,slca search paradigm,slca node,missing element-conscious slca node,missing element,xml data,superior quality search result,xml keyword search,xml document,keyword search
Data mining,Population,XML,Computer science,Keyword search,Xml data,Messiah,Schema (psychology),Common descent,Database
Conference
Citations 
PageRank 
References 
10
0.53
19
Authors
4
Name
Order
Citations
PageRank
Ba Quan Truong1694.98
Sourav S. Bhowmick21519272.35
Curtis Dyreson327740.59
Aixin Sun43071156.89