Automatically extracting web data records - Citegraph

Paper Info

Title
Automatically extracting web data records

Abstract
It is essential for Web applications such as e-commerce portals to enrich their existing content offerings by aggregating relevant structured data (e.g., product reviews) from external Web resources. To meet this goal, in this paper, we present an algorithm for automatically extracting data records from Web pages. The algorithm uses a robust string matching technique for accurately identifying the records in the Webpage. Our experiments on diverse datasets (including datasets from third-party research projects) show that the proposed algorithm is highly effective and performs considerably better than two other state-of-the-art automatic data extraction systems. We made the proposed system publicly accessible in order for the readers to evaluate it.

Year	DOI	Venue
2010	10.1007/978-3-642-15470-6_51	AMT
Keywords	Field	DocType
web data record,relevant structured data,external web resource,proposed system,diverse datasets,state-of-the-art automatic data extraction,e-commerce portal,proposed algorithm,web page,web application,data record,e commerce,string matching,web pages,structured data	Data mining,Web mining,Information retrieval,Web page,Web mapping,Computer science,Data Web,Web modeling,Data extraction,Web application,Web service	Conference
Volume	ISSN	ISBN
6335.0	0302-9743	3-642-15469-7
Citations	PageRank	References
1	0.36	8
Authors
3

Authors (3 rows)

Cited by (1 rows)

References (8 rows)

Name	Order	Citations	PageRank
Dheerendranath Mundluru	1	12	2.16
Vijay V. Raghavan	2	2544	506.92
Zonghuan Wu	3	492	27.08

1