Title | ||
---|---|---|
A Latent Semantic Approach to XML Clustering by Content and Structure Based on Non-negative Matrix Factorization |
Abstract | ||
---|---|---|
Non-negative matrix factorization is intensively used in text clustering. We investigate its exploitation in the XML domain for clustering XML documents by structure and content into topically homogeneous groups. Non-negative matrix factorization is performed through an alternating least squares method, which incorporates expedients to attenuate the burden of large-scale factorizations. This is especially relevant when massive text-centric XML corpora are processed. Empirical evidence from a comparative evaluation on real-world XML corpora reveals that our approach overcomes several state-of-the-art competitors in effectiveness. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1109/ICMLA.2013.38 | ICMLA (1) |
Keywords | Field | DocType |
squares method,clustering xml document,non-negative matrix factorization,state-of-the-art competitor,text clustering,large-scale factorization,xml domain,comparative evaluation,latent semantic approach,empirical evidence,massive text-centric xml corpus,matrix decomposition,xml,text analysis | Data mining,XML,Homogeneous,Document clustering,Computer science,Matrix decomposition,Document-term matrix,Non-negative matrix factorization,Alternating least squares,Cluster analysis | Conference |
Citations | PageRank | References |
6 | 0.42 | 13 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Gianni Costa | 1 | 235 | 24.04 |
Riccardo Ortale | 2 | 282 | 27.46 |