Title
Using Graphs and Semantic Information to Improve Text Classifiers.
Abstract
Text classification using semantic information is the latest trend of research due to its greater potential to accurately represent text content compared with bag-of-words (BOW) approaches. On the other hand, representation of semantics through graphs has several advantages over the traditional representation of feature vector. Therefore, error tolerant graph matching techniques can be used for text classification. Nevertheless, very few methodologies exist in the literature which use semantic representation through graphs. In the present work, a methodology has been proposed to represent semantic information from a summarized text into a graph. The discourse representation structure of a text is utilized in order to represent its semantic content and, afterwards, it is transformed into a graph. Five different graph matching techniques based on Maximum Common Subgraphs (mcs) and Minimum Common Supergraphs (MCS) are evaluated on 20 classes from the Reuters dataset taking 10 docs of each class for both training and testing purposes using the k-NN classifier. From the results it can be observed that the technique has potential to perform text classification as well as the traditional BOW approaches. Moreover a majority voting based combination of the semantic representation and a traditional BOW approach provided an improved recognition accuracy on the same data set.
Year
DOI
Venue
2014
10.1007/978-3-319-10888-9_33
ADVANCES IN NATURAL LANGUAGE PROCESSING
Field
DocType
Volume
Graph kernel,Text graph,Feature vector,Computer science,Distance,Explicit semantic analysis,Matching (graph theory),Artificial intelligence,Classifier (linguistics),Semantics,Machine learning
Conference
8686
ISSN
Citations 
PageRank 
0302-9743
1
0.35
References 
Authors
8
4
Name
Order
Citations
PageRank
Nibaran Das139140.72
Swarnendu Ghosh2205.37
Teresa Gonçalves3141.24
Paulo Quaresma4182.73