Title
Question classification based on co-training style semi-supervised learning
Abstract
In statistical question classification, semi-supervised learning that can exploit the abundant unlabeled samples has received substantial attention in recent years. In this paper, a novel question classification approach with the co-training style semi-supervised learning is proposed. In particular, the method extracts high-frequency keywords as classification features, and uses the word semantic similarity to adjust the feature weights. The classifiers are initially trained from labeled data and then the learned models are refined using unlabeled data which can get labeled if the classifiers agree on the labeling. Experiments on the Chinese question answering system in tourism domain were conducted by employing different feature selections, different supervised and semi-supervised algorithms, different feature dimensions and different unlabeled rates. The experimental results show the proposed method can effectively improve the classification accuracy. Specifically, under the 40% unlabeled rate of training set, the average accuracy rates reach 88.9% on coarse types and 78.2% on fine types, respectively, which get an improvement of around 2-4% points.
Year
DOI
Venue
2010
10.1016/j.patrec.2010.06.010
Pattern Recognition Letters
Keywords
Field
DocType
semi-supervised learning,chinese question classification,different unlabeled rate,co-training,abundant unlabeled sample,classification accuracy,co-training style,different feature selection,different feature dimension,classification feature,novel question classification approach,statistical question classification,word semantic similarity,unlabeled data,semantic similarity,question answering system,high frequency,feature selection,semi supervised learning
Semantic similarity,Signal processing,Similitude,Semi-supervised learning,Question answering,Pattern recognition,Computer science,Co-training,Supervised learning,Feature extraction,Artificial intelligence,Machine learning
Journal
Volume
Issue
ISSN
31
13
Pattern Recognition Letters
Citations 
PageRank 
References 
11
0.61
7
Authors
6
Name
Order
Citations
PageRank
Zhengtao Yu146069.08
Lei Su2110.61
Lina Li3110.95
Quan Zhao4110.61
Cunli Mao55111.54
Jianyi Guo62010.99