Cost-sensitive learning for large-scale hierarchical classification - Citegraph

Paper Info

Title
Cost-sensitive learning for large-scale hierarchical classification

Abstract
We study hierarchical classification of products in electronic commerce, classifying a text description of a product into one of the leaf classes of a tree-structure taxonomy. In particular, we investigate two essential problems, performance evaluation and learning, in a synergistic way. Unless we know what is the appropriate performance evaluation metric for a task, we are not going to learn a classifier of maximum utility for the task. Given the characteristics of the task of hierarchical product classification, we shed insight into how and why common evaluation metrics such as error rate can be misleading, which is applicable for treating other real world applications. The analysis leads to a new performance evaluation metric that tailors this task to reflect a vendor's business goal of maximizing revenue. The proposed metric has an intuitive meaning as the average revenue loss, which depends on both the value of individual products and the hierarchical distance between the true class and the predicted class. Correspondingly, our learning algorithm uses multi-class SVM with margin re-scaling to optimize the proposed metric, instead of error rate or other common metrics. Margin re-scaling is sensitive to the scaling of loss functions. We propose a loss normalization approach to appropriately calibrating the scaling of loss functions, which is applicable to general classification and structured prediction tasks whenever using structured SVM with margin re-scaling. Experiments on a large dataset show that our approach outperforms standard multi-class SVM in terms of the proposed metric, effectively reducing the average revenue loss.

Year	DOI	Venue
2013	10.1145/2505515.2505582	CIKM
Keywords	Field	DocType
proposed metric,loss normalization approach,large-scale hierarchical classification,loss function,appropriate performance evaluation,error rate,margin re-scaling,performance evaluation,average revenue loss,common evaluation metrics,new performance evaluation metric,product classification,taxonomy,svm	Structured support vector machine,Document classification,Data mining,Normalization (statistics),Computer science,Word error rate,Support vector machine,Structured prediction,Artificial intelligence,Classifier (linguistics),Product classification,Machine learning	Conference
Citations	PageRank	References
7	0.57	19
Authors
2

Authors (2 rows)

Cited by (7 rows)

References (19 rows)

Name	Order	Citations	PageRank
Jianfu Chen	1	20	2.86
David Scott Warren	2	2447	480.41

1