Refined experts: improving classification in large taxonomies - Citegraph

Paper Info

Title
Refined experts: improving classification in large taxonomies

Abstract
While large-scale taxonomies--especially for web pages--have been in existence for some time, approaches to automatically classify documents into these taxonomies have met with limited success compared to the more general progress made in text classification. We argue that this stems from three causes: increasing sparsity of training data at deeper nodes in the taxonomy, error propagation where a mistake made high in the hierarchy cannot be recovered, and increasingly complex decision surfaces in higher nodes in the hierarchy. While prior research has focused on the first problem, we introduce methods that target the latter two problems--first by biasing the training distribution to reduce error propagation and second by propagating up "first-guess" expert information in a bottom-up manner before making a refined top down choice. Finally, we present an empirical study demonstrating that the suggested changes lead to 10--30% improvements in F1 scores versus an accepted competitive baseline, hierarchical SVMs.

Year	DOI	Venue
2009	10.1145/1571941.1571946	SIGIR
Keywords	Field	DocType
training data,complex decision surface,large taxonomy,f1 score,expert information,bottom-up manner,training distribution,empirical study,deeper node,error propagation,refined expert,accepted competitive baseline,web pages,bottom up,top down	Training set,Data mining,Propagation of uncertainty,Mistake,Information retrieval,Computer science,Support vector machine,Top-down and bottom-up design,Artificial intelligence,Hierarchy,Machine learning,Empirical research	Conference
Citations	PageRank	References
50	1.50	24
Authors
2

Authors (2 rows)

Cited by (50 rows)

References (24 rows)

Name	Order	Citations	PageRank
Paul N. Bennett	1	1500	87.93
Nam Nguyen	2	331	16.64

1