Abstract | ||
---|---|---|
Text classification using semantic information is the latest trend of research due to its greater potential to accurately represent text content compared with bag-of-words (BOW) approaches. On the other hand, representation of semantics through graphs has several advantages over the traditional representation of feature vector. Therefore, error tolerant graph matching techniques can be used for text classification. Nevertheless, very few methodologies exist in the literature which use semantic representation through graphs. In the present work, a methodology has been proposed to represent semantic information from a summarized text into a graph. The discourse representation structure of a text is utilized in order to represent its semantic content and, afterwards, it is transformed into a graph. Five different graph matching techniques based on Maximum Common Subgraphs (mcs) and Minimum Common Supergraphs (MCS) are evaluated on 20 classes from the Reuters dataset taking 10 docs of each class for both training and testing purposes using the k-NN classifier. From the results it can be observed that the technique has potential to perform text classification as well as the traditional BOW approaches. Moreover a majority voting based combination of the semantic representation and a traditional BOW approach provided an improved recognition accuracy on the same data set. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1007/978-3-319-10888-9_33 | ADVANCES IN NATURAL LANGUAGE PROCESSING |
Field | DocType | Volume |
Graph kernel,Text graph,Feature vector,Computer science,Distance,Explicit semantic analysis,Matching (graph theory),Artificial intelligence,Classifier (linguistics),Semantics,Machine learning | Conference | 8686 |
ISSN | Citations | PageRank |
0302-9743 | 1 | 0.35 |
References | Authors | |
8 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Nibaran Das | 1 | 391 | 40.72 |
Swarnendu Ghosh | 2 | 20 | 5.37 |
Teresa Gonçalves | 3 | 14 | 1.24 |
Paulo Quaresma | 4 | 18 | 2.73 |