Title
Information extraction from semi-structured resources: a two-phase finite state transducers approach
Abstract
The paper presents a new method for extracting information from semi-structured resources, based on finite state transducers. The method has two clearly distinguished phases. The first phase - pre-processing phase - strongly relies upon the analysis of the document structure and it is used for locating records of data in the text. The second phase is based on the finite state transducers created for extracting information. The transducers can be modified so that preferred efficiency is achieved and can be reused for extracting information from other pre-processed documents. We conclude that even untagged text can be treated as a semi-structured one, providing its structure can be successfully pre-processed. As a result, we extracted data from free form encyclopedia text and created a fully structured database with genotype and phenotype characteristics of the organisms.
Year
DOI
Venue
2011
10.1007/978-3-642-22256-6_26
CIAA
Keywords
Field
DocType
two-phase finite state transducers,pre-processed document,pre-processing phase,free form encyclopedia text,untagged text,distinguished phase,new method,finite state transducers,semi-structured resource,finite state,information extraction,document structure,genome,finite state transducer,bioinformatics
Transducer,Data mining,Computer science,Document Structure Description,Finite state,Information extraction,Encyclopedia,Free form,Finite state transducer
Conference
Volume
ISSN
Citations 
6807
0302-9743
1
PageRank 
References 
Authors
0.35
10
3
Name
Order
Citations
PageRank
Vesna Pajic133.18
Gordana Pavlovic-Lazetic2357.82
Miloš Pajić321.06