Title
Extracting Information from Semi-structured Web Documents: A Framework
Abstract
This article aims to automate the extraction of information from semi-structured web documents by minimizing the amount of hand coding. Extraction of information from the WWW can be used to structure the huge amount of data buried in web documents, so that data mining techniques can be applied. To achieve this target, automated extraction should be utilized to the extent possible since it must keep pace with a dynamic and chaotic Web on which analysis can be carried out using investigative data mining or social network analysis techniques. To achieve that goal a proposed framework called Spiner will be presented and analyzed in this paper.
Year
DOI
Venue
2008
10.1007/978-3-540-89376-9_5
APWeb Workshops
Keywords
Field
DocType
semi-structured web document,hand coding,extracting information,proposed framework,chaotic web,social network analysis technique,huge amount,investigative data mining,semi-structured web documents,automated extraction,web document,data mining technique,data mining,social network analysis
Data mining,Web mining,Web intelligence,Web mapping,Computer science,Web standards,Data Web,Web modeling,Web application security,Social Semantic Web
Conference
Volume
ISSN
Citations 
4977
0302-9743
0
PageRank 
References 
Authors
0.34
2
4
Name
Order
Citations
PageRank
Nasrullah Memon150456.67
Abdul Rasool Qureshi200.34
David L. Hicks337352.18
Nicholas Harkiolakis4212.95