Title
Semantic and Structure Based XML Similarity: An Integrated Approach
Abstract
Since the last decade, XML has gained growing importance as a major means for information management, and has become inevitable for complex data representation. Due to an unprecedented increasing use of the XML standard, developing efficient techniques for comparing XML-based documents becomes crucial in information retrieval (IR) research. A range of algorithms for comparing hierarchically structured data, e.g. XML documents, have been proposed in the literature. However, to our knowledge, most of them focus exclusively on comparing documents based on structural features, overlooking the semantics involved. In this paper, we deal with this problem and introduce a combined structural/semantic XML similarity approach. Our method integrates IR semantic similarity assessment in an edit distance algorithm, seeking to amend similarity judgments when comparing XML-based documents. Different from previous works, our approach comprises of an original edit distance operation cost model, introducing semantic relatedness of XML element/attribute labels, in traditional edit distance computations. A discussion about our similarity method's properties, chiefly symmetricity and triangular inequality, with respect to existing measures in the literature is provided here. A prototype has been developed to evaluate the performance of our approach. Experimental results were noticeable.
Year
Venue
Keywords
2006
COMAD
information retrieval,edit distance,information management,semantic similarity,complex data,semantic relatedness,xml document
Field
DocType
Citations 
Data mining,XML framework,Efficient XML Interchange,Information retrieval,Computer science,XML validation,Document Structure Description,XML database,XML schema,Database,XML Schema Editor,XML Signature
Conference
9
PageRank 
References 
Authors
0.50
23
3
Name
Order
Citations
PageRank
Joe Tekli120420.30
Richard Chbeir269182.42
Kokou Yétongnon321861.29