Title
Automatically extracting web data records
Abstract
It is essential for Web applications such as e-commerce portals to enrich their existing content offerings by aggregating relevant structured data (e.g., product reviews) from external Web resources. To meet this goal, in this paper, we present an algorithm for automatically extracting data records from Web pages. The algorithm uses a robust string matching technique for accurately identifying the records in the Webpage. Our experiments on diverse datasets (including datasets from third-party research projects) show that the proposed algorithm is highly effective and performs considerably better than two other state-of-the-art automatic data extraction systems. We made the proposed system publicly accessible in order for the readers to evaluate it.
Year
DOI
Venue
2010
10.1007/978-3-642-15470-6_51
AMT
Keywords
Field
DocType
web data record,relevant structured data,external web resource,proposed system,diverse datasets,state-of-the-art automatic data extraction,e-commerce portal,proposed algorithm,web page,web application,data record,e commerce,string matching,web pages,structured data
Data mining,Web mining,Information retrieval,Web page,Web mapping,Computer science,Data Web,Web modeling,Data extraction,Web application,Web service
Conference
Volume
ISSN
ISBN
6335.0
0302-9743
3-642-15469-7
Citations 
PageRank 
References 
1
0.36
8
Authors
3
Name
Order
Citations
PageRank
Dheerendranath Mundluru1122.16
Vijay V. Raghavan22544506.92
Zonghuan Wu349227.08