Title
Extracting Facets from Lost Fine-Grained Categorizations in Dataspaces.
Abstract
Categorization of instances in dataspaces is a difficult and time consuming task, usually performed by domain experts. In this paper we propose a semi-automatic approach to the extraction of facets for the fine-grained categorization of instances in dataspaces. We focus on the case where instances are categorized under heterogeneous taxonomies in several sources. Our approach leverages Taxonomy Layer Distance, a new metric based on structural analysis of source taxonomies, to support the identification of meaningful candidate facets. Once validated and refined by domain experts, the extracted facets provide a fine-grained classification of dataspace instances. We implemented and evaluated our approach in a real world dataspace in the eCommerce domain. Experimental results show that our approach is capable of extracting meaningful facets and that the new metric we propose for the structural analysis of source taxonomies outperforms other state-of-the-art metrics.
Year
DOI
Venue
2014
10.1007/978-3-319-07881-6_39
ADVANCED INFORMATION SYSTEMS ENGINEERING (CAISE 2014)
Keywords
DocType
Volume
dataspaces,web data integration,taxonomy integration,facet extraction
Conference
8484
ISSN
Citations 
PageRank 
0302-9743
1
0.35
References 
Authors
23
3
Name
Order
Citations
PageRank
Riccardo Porrini1242.88
Matteo Palmonari245044.73
Carlo Batini32122665.07