Title
A graph-based multi-level linguistic representation for document understanding
Abstract
We proposed a graph-based representation that considers multiple linguistic levels.We introduced MinText, a technique useful for extracting features from the graph.We presented a study case for analyzing the performance of the methods proposed. Document understanding goal requires discovery of meaningful patterns in text, which in turn requires analyzing documents and extracting information useful for a purpose. The documents to be analyzed are expected to be represented in some way. It is true that different representations of the same piece of text might have different information extraction outcomes. Therefore, it is very important to propose a reliable text representation schema that may incorporate as many features as possible, and at the same time provides use of efficient document understanding algorithms. In this paper, we propose a graph-based representation of textual documents that employs different levels of formal representation of natural language. This schema takes into account different linguistic levels, such as lexical, morphological, syntactical and semantics. The representation schema proposed is accompanied with a proposal for a technique which allows to extract useful text patterns based on the idea of minimum paths in the graph. The efficiency of the representation schema proposed has been tested in one case of study (Question-Answering for machine Reading Evaluation - QA4MRE), and the results of experiments carried in it, are described. The results obtained show that the proposed graph-based multi-level linguistic representation schema may be successfully used in the broader framework of document understanding.
Year
DOI
Venue
2014
10.1016/j.patrec.2013.12.004
Pattern Recognition Letters
Keywords
Field
DocType
useful text pattern,different level,representation schema,different information extraction outcome,account different linguistic level,reliable text representation schema,document understanding,graph-based multi-level linguistic representation,formal representation,different representation,graph-based representation,text mining
Text graph,Computer science,Artificial intelligence,Natural language processing,Schema (psychology),Machine reading,Graph,Information retrieval,Pattern recognition,Formal representation,Natural language,Information extraction,Linguistics,Semantics
Journal
Volume
Issue
ISSN
41
C
0167-8655
Citations 
PageRank 
References 
9
1.14
16
Authors
4
Name
Order
Citations
PageRank
David Pinto128035.77
Helena Gómez-Adorno24016.01
Darnes Vilariño34319.68
Vivek Kumar Singh427039.83