Title | ||
---|---|---|
Machine learning techniques for XML (co-)clustering by structure-constrained phrases. |
Abstract | ||
---|---|---|
A new method is proposed for clustering XML documents by structure-constrained phrases. It is implemented by three machine-learning approaches previously unexplored in the XML domain, namely non-negative matrix (tri-)factorization, co-clustering and automatic transactional clustering. A novel class of XML features approximately captures structure-constrained phrases as n-grams contextualized by root-to-leaf paths. Experiments over real-world benchmark XML corpora show that the effectiveness of the three approaches improves with contextualized n-grams of suitable length. This confirms the validity of the devised method from multiple clustering perspectives. Two approaches overcome in effectiveness several state-of-the-art competitors. The scalability of the three approaches is investigated, too. |
Year | DOI | Venue |
---|---|---|
2018 | https://doi.org/10.1007/s10791-017-9314-x | Inf. Retr. Journal |
Keywords | Field | DocType |
XML,Semi-structured data analysis,XML (co-)clustering by structure and nested text,Structure-constrained phrases,Contextualized n-grams | Data mining,Efficient XML Interchange,XML,Computer science,XML validation,XML schema,Simple API for XML,Biclustering,Cluster analysis,Scalability | Journal |
Volume | Issue | ISSN |
21 | 1 | 1386-4564 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Gianni Costa | 1 | 235 | 24.04 |
Riccardo Ortale | 2 | 282 | 27.46 |