A Reference Architecture to Devise Web Information Extractors. - Citegraph

Paper Info

Title
A Reference Architecture to Devise Web Information Extractors.

Abstract
The Web is the largest repository of human-friendly information. Unfortunately, web information is embedded in formatting tags and is surrounded by irrelevant information. Researchers are working on information extractors that allow transforming this information into structured data for its later integration into automated processes. Devising a new information extraction technique requires an array of tasks that are specific to this technique and many tasks that are actually common between all techniques. The lack of a reference architectural proposal in the literature to guide software engineers in the design and implementation of information extractors, amounts to little reuse and the focus is usually blurred because of irrelevant details. In this paper, we present a reference architecture to design and implement rule learners for information extractors. We have implemented a software framework to support our architecture, and we have validated it by means of four case studies and a number of experiments that prove that our proposal helps reduce development costs significantly.

Year	DOI	Venue
2012	10.1007/978-3-642-31069-0_21	ADVANCED INFORMATION SYSTEMS ENGINEERING WORKSHOPS, CAISE 2012
Keywords	Field	DocType
Information Extraction,Rule Learning Reference Architecture	Website architecture,Data mining,Data architecture,World Wide Web,Information retrieval,Computer science,Software,Information extraction,Disk formatting,Reference architecture,Data model,Software framework	Conference
Volume	ISSN	Citations
112	1865-1348	3
PageRank	References	Authors
0.39	21	2

Authors (2 rows)

Cited by (3 rows)

References (21 rows)

Name	Order	Citations	PageRank
Hassan A. Sleiman	1	103	8.33
Rafael Corchuelo	2	389	49.87

1