Abstract | ||
---|---|---|
In this paper, we describe a method for clustering XML documents. Its goal is to group documents sharing similar structures. Our approach is two-step. We first automatically extract the structure from each XML document to be classified. This extracted structure is then used as a representation model to classify the corresponding XML document. The idea behind the clustering is that if XML documents share similar structures, they are more likely to correspond to the structural part of the same query. Finally, for the experimentation purpose, we tested our algorithms on both real (ACM SIGMOD Record corpus) and synthetic data. The results clearly demonstrate the interest of our approach. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1007/s10115-011-0421-5 | Knowl. Inf. Syst. |
Keywords | Field | DocType |
group document,structural similarity,clustering xml document,structural part,representation model,corresponding xml document,synthetic data,similar structure,xml document,clustering · context · node · similarity · structural classification · threshold · tree,experimentation purpose,acm sigmod record corpus | Fuzzy clustering,Data mining,XML Schema (W3C),Information retrieval,XML,Computer science,XML validation,Structural classification,Synthetic data,Simple API for XML,Cluster analysis | Journal |
Volume | Issue | ISSN |
32 | 1 | 0219-3116 |
Citations | PageRank | References |
7 | 0.43 | 41 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ali Aïtelhadj | 1 | 8 | 0.77 |
Mohand Boughanem | 2 | 923 | 109.00 |
Mohamed Mezghiche | 3 | 25 | 11.68 |
Fatiha Souam | 4 | 8 | 1.11 |