Title
A Latent Semantic Approach to XML Clustering by Content and Structure Based on Non-negative Matrix Factorization
Abstract
Non-negative matrix factorization is intensively used in text clustering. We investigate its exploitation in the XML domain for clustering XML documents by structure and content into topically homogeneous groups. Non-negative matrix factorization is performed through an alternating least squares method, which incorporates expedients to attenuate the burden of large-scale factorizations. This is especially relevant when massive text-centric XML corpora are processed. Empirical evidence from a comparative evaluation on real-world XML corpora reveals that our approach overcomes several state-of-the-art competitors in effectiveness.
Year
DOI
Venue
2013
10.1109/ICMLA.2013.38
ICMLA (1)
Keywords
Field
DocType
squares method,clustering xml document,non-negative matrix factorization,state-of-the-art competitor,text clustering,large-scale factorization,xml domain,comparative evaluation,latent semantic approach,empirical evidence,massive text-centric xml corpus,matrix decomposition,xml,text analysis
Data mining,XML,Homogeneous,Document clustering,Computer science,Matrix decomposition,Document-term matrix,Non-negative matrix factorization,Alternating least squares,Cluster analysis
Conference
Citations 
PageRank 
References 
6
0.42
13
Authors
2
Name
Order
Citations
PageRank
Gianni Costa123524.04
Riccardo Ortale228227.46