Extracting Information from Semi-structured Web Documents: A Framework - Citegraph

Paper Info

Title
Extracting Information from Semi-structured Web Documents: A Framework

Abstract
This article aims to automate the extraction of information from semi-structured web documents by minimizing the amount of hand coding. Extraction of information from the WWW can be used to structure the huge amount of data buried in web documents, so that data mining techniques can be applied. To achieve this target, automated extraction should be utilized to the extent possible since it must keep pace with a dynamic and chaotic Web on which analysis can be carried out using investigative data mining or social network analysis techniques. To achieve that goal a proposed framework called Spiner will be presented and analyzed in this paper.

Year	DOI	Venue
2008	10.1007/978-3-540-89376-9_5	APWeb Workshops
Keywords	Field	DocType
semi-structured web document,hand coding,extracting information,proposed framework,chaotic web,social network analysis technique,huge amount,investigative data mining,semi-structured web documents,automated extraction,web document,data mining technique,data mining,social network analysis	Data mining,Web mining,Web intelligence,Web mapping,Computer science,Web standards,Data Web,Web modeling,Web application security,Social Semantic Web	Conference
Volume	ISSN	Citations
4977	0302-9743	0
PageRank	References	Authors
0.34	2	4

Authors (4 rows)

Cited by (0 rows)

References (2 rows)

Name	Order	Citations	PageRank
Nasrullah Memon	1	504	56.67
Abdul Rasool Qureshi	2	0	0.34
David L. Hicks	3	373	52.18
Nicholas Harkiolakis	4	21	2.95

1