Title
Structure and content scoring for XML
Abstract
XML repositories are usually queried both on structure and content. Due to structural heterogeneity of XML, queries are often interpreted approximately and their answers are returned ranked by scores. Computing answer scores in XML is an active area of research that oscillates between pure content scoring such as the well-known tf*idf and taking structure into account. However, none of the existing proposals fully accounts for structure and combines it with content to score query answers. We propose novel XML scoring methods that are inspired by tf*idf and that account for both structure and content while considering query relaxations. Twig scoring, accounts for the most structure and content and is thus used as our reference method. Path scoring is an approximation that loosens correlations between query nodes hence reducing the amount of time required to manipulate scores during top-k query processing. We propose efficient data structures in order to speed up ranked query processing. We run extensive experiments that validate our scoring methods and that show that path scoring provides very high precision while improving score computation time.
Year
Venue
Keywords
2005
VLDB
content scoring,score query answer,twig scoring,efficient data structure,query relaxation,scoring method,pure content,query node,query processing,novel xml scoring method,path scoring,data structure,oscillations
Field
DocType
ISBN
Data structure,Data mining,Information retrieval,Ranking,XML,Computer science,Database,Speedup,Computation
Conference
1-59593-154-6
Citations 
PageRank 
References 
92
3.28
14
Authors
5
Name
Order
Citations
PageRank
Sihem Amer-Yahia12400176.15
Nick Koudas26424566.00
Amélie Marian3128077.92
Divesh Srivastava489841191.22
David Toman51045148.21