Title
Chinese Text Classification Using Key Characters String Kernel
Abstract
Most Chinese text classification methods are based on Chinese word segmentation and bag of words (BOW). The classification performance largely relies on the accuracy of segmentation. Unfortunately, perfect precision and disambiguation of segmentation cannot be reached. In order to solve this problem, a novel Chinese text classification method using string kernel is presented. String kernel computes the similarity of a pair of documents by comparing common substrings they have. Experimental results show that our method greatly enhances the classification on small training data sets. Although the performance of traditional string kernel is comparable to that of BOW methods on larger data set, the dimension of feature space is so high that the calculation process is very time-consuming. Our proposed key characters string kernel technique solves the efficiency and effectiveness problems. Experiments on larger data set show that SVM with key characters string kernel can achieve superior performance.
Year
DOI
Venue
2009
10.1109/SKG.2009.59
SKG
Keywords
Field
DocType
sting kernel,chinese text classification,novel chinese text classification,kernel technique,key characters string kernel,support vector machines,text classification,proposed key characters string,svm,key characters,string kernel,chinese text classification method,traditional string kernel,support vector machine,larger data,classification,natural language processing,chinese word segmentation andbag,small training data set,classification performance,text analysis,feature extraction,accuracy,data mining,feature space,kernel,bag of words
Substring,Radial basis function kernel,Pattern recognition,Kernel embedding of distributions,Computer science,Support vector machine,Text segmentation,Tree kernel,Artificial intelligence,String kernel,String metric
Conference
ISBN
Citations 
PageRank 
978-0-7695-3810-5
0
0.34
References 
Authors
13
4
Name
Order
Citations
PageRank
S. Zheng1142.29
Yang Yu-Jiu28919.30
Wu H301.01
Liu W.420922.42