A Loss Function Analysis for Classification Methods in Text Categorization - Citegraph

Paper Info

Title
A Loss Function Analysis for Classification Methods in Text Categorization

Abstract
This paper presents a formal analysis of popular text classification methods, focusing on their loss functions whose minimization is essential to the optimization of those methods, and whose decomposition into the trainingset loss and the model complexity enables cross-method comparisons on a common basis from an optimization point of view. Those methods include Support Vector Machines, Linear Regression, Logistic Regression, Neural Network, Naive Baycs, K-Nearest Neighbor, Rocchio-style and Multi-class Prototype classifiers. Theoretical analysis (including our new derivations) is provided for each method, along with e~-aluation results for all the methods on the Reuters-21578 benchmark corpus. Using linear regression, neural networks and logistic regression methods as examples, we show that properly tuning the balance between the training-set loss and the complexity penalty would have a significant impact to the performance of a classifier. In linear regression, in particular, the tuning of the complexity penalty yielded a result (measured using macro-averaged F1) that outperformed all text categorization methods ever evaluated on that benchmark corpus, including Support Vector Machines.

Year	Venue	Keywords
2003	ICML	loss function
Field	DocType	Citations
Data mining,Computer science,Minification,Artificial intelligence,Classifier (linguistics),Text categorization,Artificial neural network,Logistic regression,Linear regression,Naive Bayes classifier,Pattern recognition,Support vector machine,Machine learning	Conference	32
PageRank	References	Authors
5.59	7	2

Authors (2 rows)

Cited by (32 rows)

References (7 rows)

Name	Order	Citations	PageRank
Fan Li	1	39	14.25
Yiming Yang	2	3299	344.91

1