Title
A model for information extraction in portuguese based on text patterns
Abstract
This paper proposes an information extraction model that identifies text patterns representing relations between two entities. It is proposed that, given a set of entity pairs representing a specific relation, it is possible to find text patterns representing such relation within sentences from documents containing those entites. After those text patterns are identified, it is possible to attempt the extraction of a complementary entity, considering the first entity of the relation and the related text patterns are provided. The pattern selection relies on regular expressions, frequency and identification of less relevant words. Modern search engines APIs and HTML parsers are used to retrieve and parse web pages in real time, eliminating the need of a pre-established corpus. The retrieval of document counts within a timeframe is also used to aid in the selection of the entities extracted.
Year
DOI
Venue
2013
10.1007/978-3-642-37256-8_30
CICLing (2)
Keywords
Field
DocType
text pattern,pattern selection,information extraction model,modern search engines apis,document count,html parsers,specific relation,entity pair,complementary entity,related text pattern
Text mining,Regular expression,Search engine,Information retrieval,Web page,Computer science,Portuguese,Information extraction,Natural language processing,Artificial intelligence,Parsing,Relationship extraction
Conference
Citations 
PageRank 
References 
0
0.34
8
Authors
2
Name
Order
Citations
PageRank
Tiago Luis Bonamigo1111.00
Renata Vieira28211.44