Title | ||
---|---|---|
Information extraction from semi-structured resources: a two-phase finite state transducers approach |
Abstract | ||
---|---|---|
The paper presents a new method for extracting information from semi-structured resources, based on finite state transducers. The method has two clearly distinguished phases. The first phase - pre-processing phase - strongly relies upon the analysis of the document structure and it is used for locating records of data in the text. The second phase is based on the finite state transducers created for extracting information. The transducers can be modified so that preferred efficiency is achieved and can be reused for extracting information from other pre-processed documents. We conclude that even untagged text can be treated as a semi-structured one, providing its structure can be successfully pre-processed. As a result, we extracted data from free form encyclopedia text and created a fully structured database with genotype and phenotype characteristics of the organisms. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1007/978-3-642-22256-6_26 | CIAA |
Keywords | Field | DocType |
two-phase finite state transducers,pre-processed document,pre-processing phase,free form encyclopedia text,untagged text,distinguished phase,new method,finite state transducers,semi-structured resource,finite state,information extraction,document structure,genome,finite state transducer,bioinformatics | Transducer,Data mining,Computer science,Document Structure Description,Finite state,Information extraction,Encyclopedia,Free form,Finite state transducer | Conference |
Volume | ISSN | Citations |
6807 | 0302-9743 | 1 |
PageRank | References | Authors |
0.35 | 10 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Vesna Pajic | 1 | 3 | 3.18 |
Gordana Pavlovic-Lazetic | 2 | 35 | 7.82 |
Miloš Pajić | 3 | 2 | 1.06 |