Abstract | ||
---|---|---|
On-line retailers as well as e-shoppers are very interested in gathering product records from the Web in order to compare products and prices. The consumers compare products and prices to find the best price for a specific product or they want to identify alternatives for a product whereas the on-line retailers need to compare their offers with those of their competitors for being able to remain competitive. As there is a huge number and vast array of product offers in the Web the product data needs to be collected through an automated approach. The contribution of this papers is a novel approach for automatically identify and extract product records from arbitrary e-shop websites. The approach extends an existing technique which is called Tag Path Clustering for clustering similar HTML tag paths. The clustering mechanism is combined with a novel filtering mechanism for identifying the product records to be extracted within the websites. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1007/978-3-319-30996-5_12 | Lecture Notes in Business Information Processing |
Keywords | Field | DocType |
Web data extraction,Product record extraction,Tag path clustering | HTML element,World Wide Web,Information retrieval,Computer science,Filter (signal processing),Product data,Cluster analysis,Competitor analysis | Conference |
Volume | ISSN | Citations |
246 | 1865-1348 | 0 |
PageRank | References | Authors |
0.34 | 7 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Andrea Horch | 1 | 2 | 1.45 |
Holger Kett | 2 | 4 | 1.76 |
Anette Weisbecker | 3 | 202 | 34.72 |