Title
Mining Linked Open Data: A Case Study With Genes Responsible For Intellectual Disability
Abstract
Linked Open Data (LOD) constitute a unique dataset that is in a standard format, partially integrated, and facilitates connections with domain knowledge represented within semantic web ontologies. Increasing amounts of biomedical data provided as LOD consequently offer novel opportunities for knowledge discovery in biomedicine. However, most data mining methods are neither adapted to LOD format, nor adapted to consider domain knowledge. We propose in this paper an approach for selecting, integrating, and mining LOD with the goal of discovering genes responsible for a disease. The selection step relies on a set of choices made by a domain expert to isolate relevant pieces of LOD. Because these pieces are potentially not linked, an integration step is required to connect unlinked pieces. The resulting graph is subsequently mined using Inductive Logic Programming (ILP) that presents two main advantages. First, the input format compliant with ILP is close to the format of LOD. Second, domain knowledge can be added to this input and considered by ILP. We have implemented and applied this approach to the characterization of genes responsible for intellectual disability. On the basis of this real-world use case, we present an evaluation of our mining approach and discuss its advantages and drawbacks for the mining of biomedical LOD.
Year
DOI
Venue
2014
10.1007/978-3-319-08590-6_2
DATA INTEGRATION IN THE LIFE SCIENCES, DILS 2014
Field
DocType
Volume
Inductive logic programming,Data integration,Data mining,Domain knowledge,Computer science,Subject-matter expert,Semantic Web,Linked data,SPARQL,Knowledge extraction,Database
Conference
8574
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
21
7