LCA-based selection for XML document collections - Citegraph

Paper Info

Title
LCA-based selection for XML document collections

Abstract
In this paper, we address the problem of database selection for XML document collections, that is, given a set of collections and a user query, how to rank the collections based on their goodness to the query. Goodness is determined by the relevance of the documents in the collection to the query. We consider keyword queries and support Lowest Common Ancestor (LCA) semantics for defining query results, where the relevance of each document to a query is determined by properties of the LCA of those nodes in the XML document that contain the query keywords. To avoid evaluating queries against each document in a collection, we propose maintaining in a preprocessing phase, information about the LCAs of all pairs of keywords in a document and use it to approximate the properties of the LCA-based results of a query. To improve storage and processing efficiency, we use appropriate summaries of the LCA information based on Bloom filters. We address both a boolean and a weighted version of the database selection problem. Our experimental results show that our approach incurs low errors in the estimation of the goodness of a collection and provides rankings that are very close to the actual ones.

Year	DOI	Venue
2010	10.1145/1772690.1772743	WWW
Keywords	Field	DocType
bloom filter,lca information,database selection,lca-based selection,query result,database selection problem,xml document collection,user query,query keyword,xml document,keyword query,lowest common ancestor,xml	Query optimization,Data mining,World Wide Web,Lowest common ancestor,XML,Well-formed document,Information retrieval,XML validation,Computer science,Document Structure Description,Web query classification,Simple API for XML	Conference
Citations	PageRank	References
4	0.46	26
Authors
2

Authors (2 rows)

Cited by (4 rows)

References (26 rows)

Name	Order	Citations	PageRank
Georgia Koloniari	1	220	16.49
evaggelia pitoura	2	1968	321.56

1