Automatic chinese text classification using n-gram model - Citegraph

Paper Info

Title
Automatic chinese text classification using n-gram model

Abstract
Automatic Chinese text classification is an important and well-known research topic in the field of information retrieval and natural language processing. However, past researches often ignore the problem of word segmentation and the relationship between words. This paper proposes an N-gram-based language model for Chinese text classification which considers the relationship between words. To prevent from the out-of-vocabulary problem, a novel smoothing method based on logistic regression is also proposed to improve the performance. The experimental result shows that our approach outperforms the previous N-gram-based classification model above 11% on micro-average F-measure.

Year	DOI	Venue
2010	10.1007/978-3-642-12179-1_38	ICCSA (3)
Keywords	Field	DocType
chinese text classification,automatic chinese text classification,micro-average f-measure,information retrieval,out-of-vocabulary problem,natural language processing,logistic regression,n-gram-based language model,n-gram model,previous n-gram-based classification model,language model,word segmentation,n gram,feature selection	Bag-of-words model,Feature selection,Computer science,Text segmentation,Smoothing,Natural language processing,n-gram,Language identification,Artificial intelligence,Logistic regression,Machine learning,Language model	Conference
Volume	ISSN	ISBN
6018	0302-9743	3-642-12178-0
Citations	PageRank	References
0	0.34	16
Authors
5

Authors (5 rows)

Cited by (0 rows)

References (16 rows)

Name	Order	Citations	PageRank
Show-Jane Yen	1	537	130.05
Yue-Shi Lee	2	543	41.14
Yu-Chieh Wu	3	247	23.16
Jia-Ching Ying	4	34	3.18
Vincent S. Tseng	5	2923	161.33

1