Title
Rule-Based Information Extraction for Structured Data Acquisition using TextMarker
Abstract
Information extraction is concerned with the lo- cation of specific items in (unstructured) textual documents, e.g., being applied for the acquisition of structured data. Then, the acquired data can be applied for mining methods requiring structured input data, in contrast to other text mining meth- ods that utilize a bag-of-words approach. This paper presents a semi-automatic approach for structured data acquisition using a rule-based information extraction system. We propose a semi-automatic process model that includes the TEXTMARKER system for information extrac- tion and data acquisition from textual documents. TEXTMARKER applies simple rules for extract- ing blocks from a given (semi-structured) doc- ument, which can be further analyzed using domain-specific rules. Thus, both low-level and higher-level information extraction is supported. We demonstrate the applicability and benefit of the approach with two case studies of two real- world applications.
Year
Venue
Keywords
2008
LWA
rule based,structured data,information extraction
Field
DocType
Citations 
Rule-based system,Text mining,Information retrieval,Computer science,Data acquisition,Information extraction,Data model
Conference
7
PageRank 
References 
Authors
0.80
7
3
Name
Order
Citations
PageRank
Martin Atzmüller121023.00
Peter Klügl2437.00
Frank Puppe364999.50