Title | ||
---|---|---|
An unsupervised approach for acquiring ontologies and RDF data from online life science databases |
Abstract | ||
---|---|---|
In the Linked Open Data cloud one of the largest data sets, comprising of 2.5 billion triples, is derived from the Life Science domain. Yet this represents a small fraction of the total number of publicly available data sources on the Web. We briefly describe past attempts to transform specific Life Science sources from a plethora of open as well as proprietary formats into RDF data. In particular, we identify and tackle two bottlenecks in current practice: Acquiring ontologies to formally describe these data and creating “RDFizer” programs to convert data from legacy formats into RDF. We propose an unsupervised method, based on transformation rules, for performing these two key tasks, which makes use of our previous work on unsupervised wrapper induction for extracting labelled data from complete Life Science Web sites. We apply our approach to 13 real-world online Life Science databases. The learned ontologies are evaluated by domain experts as well as against gold standard ontologies. Furthermore, we compare the learned ontologies against ontologies that are “lifted” directly from the underlying relational schema using an existing unsupervised approach. Finally, we apply our approach to three online databases to extract RDF data. Our results indicate that this approach can be used to bootstrap and speed up the migration of life science data into the Linked Open Data cloud. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1007/978-3-642-13489-0_22 | ESWC (2) |
Keywords | Field | DocType |
online life science databases,life science domain,rdf data,existing unsupervised approach,largest data set,labelled data,available data source,life science databases,linked open data cloud,life science data,complete life science web,gold standard,linked open data | Data science,Ontology (information science),Data mining,Data set,Computer science,Linked data,Formal concept analysis,Schema (psychology),RDF Schema,Database,RDF,Cloud computing | Conference |
Volume | ISSN | ISBN |
6089 | 0302-9743 | 3-642-13488-2 |
Citations | PageRank | References |
1 | 0.39 | 22 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Saqib Mir | 1 | 225 | 19.96 |
Steffen Staab | 2 | 6658 | 593.89 |
Isabel Rojas | 3 | 366 | 30.30 |