Title
Word sense disambiguation for exploiting hierarchical thesauri in text classification
Abstract
The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification task, extending the traditional “bag of words” representation, incorporating syntactic and semantic relationships among words. In this paper we address this problem by proposing a Word Sense Disambiguation (WSD) approach based on the intuition that word proximity in the document implies proximity also in the HT graph. We argue that the high precision exhibited by our WSD algorithm in various humanly-disambiguated benchmark datasets, is appropriate for the classification task. Moreover, we define a semantic kernel, based on the general concept of GVSM kernels, that captures the semantic relations contained in the hierarchical thesaurus. Finally, we conduct experiments using various corpora achieving a systematic improvement in classification accuracy using the SVM algorithm, especially when the training set is small.
Year
DOI
Venue
2005
10.1007/11564126_21
PKDD
Keywords
Field
DocType
word sense disambiguation,classification task,significant semantic information,classification accuracy,hierarchical thesaurus,svm algorithm,semantic kernel,semantic relationship,text classification task,ht graph,semantic relation
Bag-of-words model,Information retrieval,Computer science,Support vector machine,Natural language processing,Artificial intelligence,Knowledge extraction,Parsing,Kernel method,Syntax,Semantics,Polysemy
Conference
Volume
ISSN
ISBN
3721
0302-9743
3-540-29244-6
Citations 
PageRank 
References 
37
1.53
15
Authors
5
Name
Order
Citations
PageRank
Dimitrios Mavroeidis11309.50
George Tsatsaronis242729.66
Michalis Vazirgiannis33942268.00
Martin Theobald4147472.06
Gerhard Weikum5127102146.01