Title
Information integration and knowledge acquisition from semantically heterogeneous biological data sources
Abstract
We present INDUS (Intelligent Data Understanding System), a federated, query-centric system for knowledge acquisition from autonomous, distributed, semantically heterogeneous data sources that can be viewed (conceptually) as tables. INDUS employs ontologies and inter-ontology mappings, to enable a user or an application to view a collection of such data sources (regardless of location, internal structure and query interfaces) as though they were a collection of tables structured according to an ontology supplied by the user. This allows INDUS to answer user queries against distributed, semantically heterogeneous data sources without the need for a centralized data warehouse or a common global ontology. We used INDUS framework to design algorithms for learning probabilistic models (e.g., Naive Bayes models) for predicting GO functional classification of a protein based on training sequences that are distributed among SWISSPROT and MIPS data sources. Mappings such as EC2GO and MIPS2GO were used to resolve the semantic differences between these data sources when answering queries posed by the learning algorithms. Our results show that INDUS can be successfully used for integrative analysis of data from multiple sources needed for collaborative discovery in computational biology.
Year
DOI
Venue
2005
10.1007/11530084_15
DILS
Keywords
Field
DocType
common global ontology,intelligent data understanding system,information integration,collaborative discovery,indus framework,centralized data warehouse,user query,knowledge acquisition,naive bayes model,semantically heterogeneous data source,semantically heterogeneous biological data,data source,mips data source,computational biology,probabilistic model,biological data,naive bayes,data warehouse,ontology mapping,col
Data warehouse,Ontology (information science),Information integration,Ontology,Biological data,Data mining,Ontology-based data integration,Data analysis,Information retrieval,Computer science,Database,Knowledge acquisition
Conference
Volume
ISSN
ISBN
3615
0302-9743
3-540-27967-9
Citations 
PageRank 
References 
12
0.70
17
Authors
7
Name
Order
Citations
PageRank
Doina Caragea158663.35
Jyotishman Pathak267776.52
Jie Bao324322.09
Adrian Silvescu432727.68
Carson Andorf5120.70
Drena Dobbs642335.43
Vasant Honavar73353468.10