Title
A Reference Architecture to Devise Web Information Extractors.
Abstract
The Web is the largest repository of human-friendly information. Unfortunately, web information is embedded in formatting tags and is surrounded by irrelevant information. Researchers are working on information extractors that allow transforming this information into structured data for its later integration into automated processes. Devising a new information extraction technique requires an array of tasks that are specific to this technique and many tasks that are actually common between all techniques. The lack of a reference architectural proposal in the literature to guide software engineers in the design and implementation of information extractors, amounts to little reuse and the focus is usually blurred because of irrelevant details. In this paper, we present a reference architecture to design and implement rule learners for information extractors. We have implemented a software framework to support our architecture, and we have validated it by means of four case studies and a number of experiments that prove that our proposal helps reduce development costs significantly.
Year
DOI
Venue
2012
10.1007/978-3-642-31069-0_21
ADVANCED INFORMATION SYSTEMS ENGINEERING WORKSHOPS, CAISE 2012
Keywords
Field
DocType
Information Extraction,Rule Learning Reference Architecture
Website architecture,Data mining,Data architecture,World Wide Web,Information retrieval,Computer science,Software,Information extraction,Disk formatting,Reference architecture,Data model,Software framework
Conference
Volume
ISSN
Citations 
112
1865-1348
3
PageRank 
References 
Authors
0.39
21
2
Name
Order
Citations
PageRank
Hassan A. Sleiman11038.33
Rafael Corchuelo238949.87