Analysis of structural relationships for hierarchical cluster labeling - Citegraph

Paper Info

Title
Analysis of structural relationships for hierarchical cluster labeling

Abstract
Cluster label quality is crucial for browsing topic hierarchies obtained via document clustering. Intuitively, the hierarchical structure should influence the labeling accuracy. However, most labeling algorithms ignore such structural properties and therefore, the impact of hierarchical structures on the labeling accuracy is yet unclear. In our work we integrate hierarchical information, i.e. sibling and parent-child relations, in the cluster labeling process. We adapt standard labeling approaches, namely Maximum Term Frequency, Jensen-Shannon Divergence, Chi Square Test, and Information Gain, to take use of those relationships and evaluate their impact on 4 different datasets, namely the Open Directory Project, Wikipedia, TREC Ohsumed and the CLEF IP European Patent dataset. We show, that hierarchical relationships can be exploited to increase labeling accuracy especially on high-level nodes.

Year	DOI	Venue
2010	10.1145/1835449.1835481	SIGIR
Keywords	Field	DocType
information gain,cluster label quality,hierarchical relationship,hierarchical structure,open directory project,structural relationship,maximum term frequency,chi square test,jensen-shannon divergence,hierarchical information,hierarchical cluster,clef ip european patent,jensen shannon divergence,document clustering,hierarchical clustering,term frequency	Hierarchical clustering,Cluster labeling,Data mining,Information retrieval,Directory,Document clustering,Computer science,Information gain,Artificial intelligence,Hierarchy,Clef,Machine learning	Conference
Citations	PageRank	References
13	0.76	14
Authors
3

Authors (3 rows)

Cited by (13 rows)

References (14 rows)

Name	Order	Citations	PageRank
Markus Muhr	1	74	5.53
roman kern	2	350	45.08
Michael Granitzer	3	822	80.14

1