Abstract | ||
---|---|---|
Information extraction is concerned with the lo- cation of specific items in (unstructured) textual documents, e.g., being applied for the acquisition of structured data. Then, the acquired data can be applied for mining methods requiring structured input data, in contrast to other text mining meth- ods that utilize a bag-of-words approach. This paper presents a semi-automatic approach for structured data acquisition using a rule-based information extraction system. We propose a semi-automatic process model that includes the TEXTMARKER system for information extrac- tion and data acquisition from textual documents. TEXTMARKER applies simple rules for extract- ing blocks from a given (semi-structured) doc- ument, which can be further analyzed using domain-specific rules. Thus, both low-level and higher-level information extraction is supported. We demonstrate the applicability and benefit of the approach with two case studies of two real- world applications. |
Year | Venue | Keywords |
---|---|---|
2008 | LWA | rule based,structured data,information extraction |
Field | DocType | Citations |
Rule-based system,Text mining,Information retrieval,Computer science,Data acquisition,Information extraction,Data model | Conference | 7 |
PageRank | References | Authors |
0.80 | 7 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Martin Atzmüller | 1 | 210 | 23.00 |
Peter Klügl | 2 | 43 | 7.00 |
Frank Puppe | 3 | 649 | 99.50 |