Title
Measuring similarity between collection of values
Abstract
In this paper, we propose a set of similarity metrics for manipulating collections of values occuring in XML documents. Following the data model presented in TAX algebra, we treat an XML element as a labeled ordered rooted tree. Consider that XML nodes can be either atomic, i.e, they may contain single values such as short character strings, date, etc, or complex, i.e., nested structures that contain other nodes, we propose two types of similarity metrics: MAVs, for atomic nodes and MCVs, for complex nodes. In the first case, we suggest the use of several application domain dependent metrics. In the second case, we define metrics for complex values that are structure dependent, and can be distinctly applied for it and collections of values. We also present experiments showing the effectiveness of our method.
Year
DOI
Venue
2004
10.1145/1031453.1031465
WIDM
Keywords
Field
DocType
similarity metrics,xml element,tax algebra,complex value,application domain dependent metrics,atomic node,data model,complex node,xml node,xml document,xml
Data mining,XML,Information retrieval,Computer science,XML validation,Application domain,Data model
Conference
ISBN
Citations 
PageRank 
1-58113-978-0
19
1.01
References 
Authors
15
5
Name
Order
Citations
PageRank
Carina F. Dorneles16110.35
Carlos A. Heuser238461.97
Andrei E. N. Lima3191.01
Altigran Soares da Silva471865.15
Edleno Silva de Moura598875.44