Abstract | ||
---|---|---|
The concept of heterogeneity is very important in XML data management, since many common applications must deal with large and complex collections which do not conform to a schema. Heterogeneity in XML collections can be present at many different levels (textual and structural) and needs to be addressed from several perspectives. This paper contributes a formal characterization of heterogeneity in XML collections based on information-theoretic considerations. We show how it can be applied in some important use cases, and we demonstrate its effectiveness by using it to analyze a number of relevant XML collections and retrieval approaches found in the literature. We show that a large space of highly heterogeneous collections has not been adequately addressed by these approaches. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1109/DEXA.2008.55 | DEXA Workshops |
Keywords | Field | DocType |
common application,large space,important use case,complex collection,different level,entropy-based characterization,formal characterization,xml collections,xml data management,heterogeneous collection,relevant xml collection,xml collection,entropy,xml,use case,data management,distributed processing,indexes | Data mining,Use case,Information retrieval,XML,Computer science,Xml data,Atmospheric measurements,Schema (psychology),Database | Conference |
ISSN | Citations | PageRank |
1529-4188 | 2 | 0.36 |
References | Authors | |
8 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ismael Sanz | 1 | 102 | 19.17 |
Marco Mesiti | 2 | 830 | 72.53 |
Giovanna Guerrini | 3 | 705 | 97.44 |
Rafael Berlanga | 4 | 405 | 23.71 |