Rule-Based Information Extraction for Structured Data Acquisition using TextMarker - Citegraph

Paper Info

Title
Rule-Based Information Extraction for Structured Data Acquisition using TextMarker

Abstract
Information extraction is concerned with the lo- cation of specific items in (unstructured) textual documents, e.g., being applied for the acquisition of structured data. Then, the acquired data can be applied for mining methods requiring structured input data, in contrast to other text mining meth- ods that utilize a bag-of-words approach. This paper presents a semi-automatic approach for structured data acquisition using a rule-based information extraction system. We propose a semi-automatic process model that includes the TEXTMARKER system for information extrac- tion and data acquisition from textual documents. TEXTMARKER applies simple rules for extract- ing blocks from a given (semi-structured) doc- ument, which can be further analyzed using domain-specific rules. Thus, both low-level and higher-level information extraction is supported. We demonstrate the applicability and benefit of the approach with two case studies of two real- world applications.

Year	Venue	Keywords
2008	LWA	rule based,structured data,information extraction
Field	DocType	Citations
Rule-based system,Text mining,Information retrieval,Computer science,Data acquisition,Information extraction,Data model	Conference	7
PageRank	References	Authors
0.80	7	3

Authors (3 rows)

Cited by (7 rows)

References (7 rows)

Name	Order	Citations	PageRank
Martin Atzmüller	1	210	23.00
Peter Klügl	2	43	7.00
Frank Puppe	3	649	99.50

1