Exploiting term relationship to boost text classification - Citegraph

Paper Info

Title
Exploiting term relationship to boost text classification

Abstract
Document classification provides an effective way to handle the explosive online textual data. However, in practical classification settings, we face the so-called feature sparsity problem caused by a lack of training documents or the shortness of text to be classified. In this paper, we solve the sparsity problem by exploiting term relationships along with Naive Bayes classifiers. The first method is to estimate term relationships based on the co-occurrence information of two terms in a certain context. The second method estimates the term relationships based on the distribution of terms over different hierarchical categories in a publicly available document taxonomy. Thereafter, term relationship is used to augment Naive Bayes classifiers. We test our methods on two open-domain data sets to demonstrate its advantages. The experimental results show that our method can significantly improve the classification performance, especially when we do not have enough training data or the texts are Web search queries.

Year	DOI	Venue
2009	10.1145/1645953.1646192	CIKM
Keywords	Field	DocType
explosive online textual data,term relationship,text classification,so-called feature sparsity problem,naive bayes classifier,exploiting term relationship,enough training data,available document taxonomy,open-domain data,document classification,practical classification setting,classification performance	Training set,Document classification,Data mining,Data set,Naive Bayes classifier,Information retrieval,Computer science,Web query classification,Artificial intelligence,Machine learning	Conference
Citations	PageRank	References
3	0.45	8
Authors
7

Authors (7 rows)

Cited by (3 rows)

References (8 rows)

Name	Order	Citations	PageRank
Dou Shen	1	1224	59.46
Jianmin Wu	2	10	0.96
Bin Cao	3	573	25.94
Jian-Tao Sun	4	1629	74.03
Qiang Yang	5	17039	875.69
Zheng Chen	6	5019	256.89
Ying Li	7	265	21.64

1