Title
Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology
Abstract
This paper presents some experiments in clustering homogeneous XML documents to validate an existing classification or more generally an organisational structure. Our approach integrates techniques for extracting knowledge from docu- ments with unsupervised classification (clustering) of documents. We focus on the feature selection used for representing documents and its impact on the emerging clas- sification. We mix the selection of structured features with fine textual selection based on syntactic characteristics. We illustrate and evaluate this approach with a collection of Inria activity reports for the year 2003. The objective is to cluster projects into larger groups (Themes), based on the keywords or different chapters of these activity reports. We then compare the results of clustering using different feature selections, with the official theme structure used by Inria.
Year
Venue
Keywords
2005
Clinical Orthopaedics and Related Research
categorisation,knowledge dis- covery,organisational structure,xml clustering,xml document,feature selection
Field
DocType
Volume
Data mining,Information retrieval,XML,Organizational structure,Feature selection,Homogeneous,Computer science,Typology,Knowledge extraction,Cluster analysis,Syntax
Journal
abs/cs/050
Citations 
PageRank 
References 
3
0.43
7
Authors
4
Name
Order
Citations
PageRank
Thierry Despeyroux113926.04
Yves Lechevallier233333.02
Brigitte Trousse340452.87
Anne-Marie Vercoustre433181.83