Title
XML clustering: a review of structural approaches.
Abstract
With its presence in data integration, chemistry, biological, and geographic systems, eXtensible Markup Language (XML) has become an important standard not only in computer science. A common problem among the mentioned applications involves structural clustering of XML documentsan issue that has been thoroughly studied and led to the creation of a myriad of approaches. In this paper, we present a comprehensive review of structural XML clustering. First, we provide a basic introduction to the problem and highlight the main challenges in this research area. Subsequently, we divide the problem into three subtasks and discuss the most common document representations, structural similarity measures, and clustering algorithms. In addition, we present the most popular evaluation measures, which can be used to estimate clustering quality. Finally, we analyze and compare 23 state-of-the-art approaches and arrange them in an original taxonomy. By providing an up-to-date analysis of existing structural XML clustering algorithms, we hope to showcase methods suitable for current applications and draw lines of future research.
Year
DOI
Venue
2015
10.1017/S0269888914000216
KNOWLEDGE ENGINEERING REVIEW
Field
DocType
Volume
Data integration,World Wide Web,XML,Information retrieval,Computer science,Cluster analysis,Management science
Journal
30
Issue
ISSN
Citations 
SP3.0
0269-8889
6
PageRank 
References 
Authors
0.45
50
4
Name
Order
Citations
PageRank
Maciej Piernik1132.60
Dariusz Brzezinski221311.28
Tadeusz Morzy3487282.62
Anna Lesniewska460.45