Title
An efficient Wikipedia semantic matching approach to text document classification.
Abstract
A traditional classification approach based on keyword matching represents each text document as a set of keywords, without considering the semantic information, thereby, reducing the accuracy of classification. To solve this problem, a new classification approach based on Wikipedia matching was proposed, which represents each document as a concept vector in the Wikipedia semantic space so as to understand the text semantics, and has been demonstrated to improve the accuracy of classification. However, the immense Wikipedia semantic space greatly reduces the generation efficiency of a concept vector, resulting in a negative impact on the availability of the approach in an online environment. In this paper, we propose an efficient Wikipedia semantic matching approach to document classification. First, we define several heuristic selection rules to quickly pick out related concepts for a document from the Wikipedia semantic space, making it no longer necessary to match all the concepts in the semantic space, thus greatly improving the generation efficiency of the concept vector. Second, based on the semantic representation of each text document, we compute the similarity between documents so as to accurately classify the documents. Finally, evaluation experiments demonstrate the effectiveness of our approach, i.e., which can improve the classification efficiency of the Wikipedia matching under the precondition of not compromising the classification accuracy.
Year
DOI
Venue
2017
10.1016/j.ins.2017.02.009
Inf. Sci.
Keywords
Field
DocType
Wikipedia matching,Keyword matching,Document classification,Semantics
Document classification,Heuristic,Infobox,Information retrieval,Computer science,Explicit semantic analysis,Precondition,Text document classification,Semantics,Semantic matching
Journal
Volume
Issue
ISSN
393
C
0020-0255
Citations 
PageRank 
References 
15
0.65
28
Authors
8
Name
Order
Citations
PageRank
Zongda Wu125116.20
Hui Zhu28317.00
Guiling Li3372.17
Zongmin Cui4619.89
hui huang58417.04
Jun Li6398.57
Enhong Chen72106165.57
Guandong Xu864075.03