Title
A flexible structured-based representation for XML document mining
Abstract
Abstract. This paper reports on the INRIA group’s approach to XML mining while participating in the INEX XML Mining track 2005. We use a flexible representation of XML documents,that allows taking into account the structure only or both the structure and content. Our ap- proach consists of representing XML documents,by a set of their sub- paths, defined according to some criteria (length, root beginning, leaf ending). By considering those sub-paths as words, we can use standard methods for vocabulary reduction, and simple clustering methods such as k-means. We use an implementation of the clustering algorithm known as dynamic,clouds that can work with distinct groups of independent modalities put in separate variables. This is useful in our model since embedded,sub-paths are not independent: we split potentially depen- dant paths into separate variables, resulting in each of them containing independant paths. Experiments with the INEX collections show,good results for the structure-only collections, but our approach could not scale well for large structure-and-content collections.
Year
DOI
Venue
2006
10.1007/978-3-540-34963-1_34
INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Keywords
Field
DocType
independent modality,xml mining,simple clustering method,inria group,xml document mining,embedded sub-paths,inex xml mining track,separate variable,clustering algorithm,flexible structured-based representation,inex collection,xml document,xml,k means,clustering
XML framework,Efficient XML Interchange,Information retrieval,Computer science,XML validation,Document Structure Description,XML database,XML schema,Simple API for XML,XML Catalog
Journal
Volume
ISSN
ISBN
abs/cs/0607012
Dans The Fourth International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2005)
3-540-34962-6
Citations 
PageRank 
References 
15
0.88
22
Authors
4
Name
Order
Citations
PageRank
Anne-Marie Vercoustre133181.83
Mounir Fegas2191.37
Saba Gul3150.88
Yves Lechevallier433333.02