AgentMat: Framework for data scraping and semantization - Citegraph

Paper Info

Title
AgentMat: Framework for data scraping and semantization

Abstract
Most of the enormous amount of information from the internet is available just like Web pages made for a human reader. They don't have any common interface for accessing, searching or browsing the data. Hence, it's hard to extract the semantic data from the Web, categorize them and keep them updated. For this purpose we have designed and implemented a system called AgentMat. This system is designed for efficient extraction of large amount of data from the Web pages. AgentMat processing is based on an XML-based language describing the given extraction task in a declarative way. The task description consists of system components, which connected together are able to perform the desired functionality on a general Web page. Thanks to this scraping system the raw contents from the irregularly updated and unstructured Web pages can be kept categorized and accessed together with the semantic metadata. In our pilot implementation we have built the MediaPub system, which extracts the information from various Web pages, does automatic categorizing and checks for duplicities.

Year	DOI	Venue
2009	10.1109/RCIS.2009.5089286	RCIS
Keywords	Field	DocType
XML,information retrieval systems,meta data,semantic Web,software agents,AgentMat processing,Internet,MediaPub system,Web pages,World Wide Web,XML-based language,data scraping,information extraction,semantic metadata,semantization,system components,task description,categorizing,image duplicity check,multimedia database,semantic web,web scraping	Data mining,Metadata,Web scraping,World Wide Web,Information retrieval,Web page,Computer science,Semantic Web,Information extraction,Data scraping,Semantic data model,The Internet	Conference
ISSN	ISBN	Citations
2151-1349	978-1-4244-2865-6	3
PageRank	References	Authors
0.69	4	3

Authors (3 rows)

Cited by (3 rows)

References (4 rows)

Name	Order	Citations	PageRank
Miloslav Beno	1	3	0.69
Jakub Mísek	2	3	0.69
Filip Zavoral	3	5	1.42

1