Improving semi-supervised text classification by using wikipedia knowledge - Citegraph

Paper Info

Title
Improving semi-supervised text classification by using wikipedia knowledge

Abstract
Semi-supervised text classification uses both labeled and unlabeled data to construct classifiers. The key issue is how to utilize the unlabeled data. Clustering based classification method outperforms other semi-supervised text classification algorithms. However, its achievements are still limited because the vector space model representation largely ignores the semantic relationships between words. In this paper, we propose a new approach to address this problem by using Wikipedia knowledge. We enrich document representation with Wikipedia semantic features (concepts and categories), propose a new similarity measure based on the semantic relevance between Wikipedia features, and apply this similarity measure to clustering based classification. Experiment results on several corpora show that our proposed method can effectively improve semi-supervised text classification performance.

Year	DOI	Venue
2013	10.1007/978-3-642-38562-9_3	WAIM
Keywords	Field	DocType
wikipedia feature,classification method,wikipedia knowledge,semantic relevance,improving semi-supervised text classification,wikipedia semantic feature,semi-supervised text classification performance,semantic relationship,semi-supervised text classification algorithm,semi-supervised text classification,unlabeled data,wikipedia	Data mining,Information retrieval,Similarity measure,Computer science,Semantic relevance,Explicit semantic analysis,Document representation,Vector space model,Statistical classification,Cluster analysis	Conference
Citations	PageRank	References
1	0.37	20
Authors
5

Authors (5 rows)

Cited by (1 rows)

References (20 rows)

Name	Order	Citations	PageRank
Zhilin Zhang	1	1	0.37
Huaizhong Lin	2	67	12.34
Pengfei Li	3	1	1.05
Huazhong Wang	4	1	0.37
Dongming Lu	5	163	32.29

1