Title
An Entropy-Based Characterization of the Heterogeneity of XML Collections
Abstract
The concept of heterogeneity is very important in XML data management, since many common applications must deal with large and complex collections which do not conform to a schema. Heterogeneity in XML collections can be present at many different levels (textual and structural) and needs to be addressed from several perspectives. This paper contributes a formal characterization of heterogeneity in XML collections based on information-theoretic considerations. We show how it can be applied in some important use cases, and we demonstrate its effectiveness by using it to analyze a number of relevant XML collections and retrieval approaches found in the literature. We show that a large space of highly heterogeneous collections has not been adequately addressed by these approaches.
Year
DOI
Venue
2008
10.1109/DEXA.2008.55
DEXA Workshops
Keywords
Field
DocType
common application,large space,important use case,complex collection,different level,entropy-based characterization,formal characterization,xml collections,xml data management,heterogeneous collection,relevant xml collection,xml collection,entropy,xml,use case,data management,distributed processing,indexes
Data mining,Use case,Information retrieval,XML,Computer science,Xml data,Atmospheric measurements,Schema (psychology),Database
Conference
ISSN
Citations 
PageRank 
1529-4188
2
0.36
References 
Authors
8
4
Name
Order
Citations
PageRank
Ismael Sanz110219.17
Marco Mesiti283072.53
Giovanna Guerrini370597.44
Rafael Berlanga440523.71