Title
Using structural similarity for clustering XML documents
Abstract
In this paper, we describe a method for clustering XML documents. Its goal is to group documents sharing similar structures. Our approach is two-step. We first automatically extract the structure from each XML document to be classified. This extracted structure is then used as a representation model to classify the corresponding XML document. The idea behind the clustering is that if XML documents share similar structures, they are more likely to correspond to the structural part of the same query. Finally, for the experimentation purpose, we tested our algorithms on both real (ACM SIGMOD Record corpus) and synthetic data. The results clearly demonstrate the interest of our approach.
Year
DOI
Venue
2012
10.1007/s10115-011-0421-5
Knowl. Inf. Syst.
Keywords
Field
DocType
group document,structural similarity,clustering xml document,structural part,representation model,corresponding xml document,synthetic data,similar structure,xml document,clustering · context · node · similarity · structural classification · threshold · tree,experimentation purpose,acm sigmod record corpus
Fuzzy clustering,Data mining,XML Schema (W3C),Information retrieval,XML,Computer science,XML validation,Structural classification,Synthetic data,Simple API for XML,Cluster analysis
Journal
Volume
Issue
ISSN
32
1
0219-3116
Citations 
PageRank 
References 
7
0.43
41
Authors
4
Name
Order
Citations
PageRank
Ali Aïtelhadj180.77
Mohand Boughanem2923109.00
Mohamed Mezghiche32511.68
Fatiha Souam481.11