Abstract | ||
---|---|---|
Semantic labeling is the process of mapping attributes in data sources to classes in an ontology and is a necessary step in heterogeneous data integration. Variations in data formats, attribute names and even ranges of values of data make this a very challenging task. In this paper, we present a novel domain-independent approach to automatic semantic labeling that uses machine learning techniques. Previous approaches use machine learning to learn a model that extracts features related to the data of a domain, which requires the model to be re-trained for every new domain. Our solution uses similarity metrics as features to compare against labeled domain data and learns a matching function to infer the correct semantic labels for data. Since our approach depends on the learned similarity metrics but not the data itself, it is domain-independent and only needs to be trained once to work effectively across multiple domains. In our evaluation, our approach achieves higher accuracy than other approaches, even when the learned models are trained on domains other than the test domain. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1007/978-3-319-46523-4_27 | Lecture Notes in Computer Science |
Field | DocType | Volume |
Data integration,Ontology,Data mining,Computer science,Semantic labeling,Natural language processing,Artificial intelligence,Jaccard index,Random forest | Conference | 9981 |
ISSN | Citations | PageRank |
0302-9743 | 12 | 0.61 |
References | Authors | |
8 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Minh Pham | 1 | 14 | 2.68 |
Suresh Alse | 2 | 12 | 0.95 |
Craig A. Knoblock | 3 | 5229 | 680.57 |
Pedro Szekely | 4 | 1217 | 179.80 |