Title
IRIS-TCGA: An Information Retrieval and Integration System for Genomic Data of Cancer.
Abstract
Data integration is one of the most challenging research topic in many knowledge domains, and biology is surely one of them. However theory and state of the art methods make this task complex for most of the small research centers. Fortunately, several organizations are focusing on collecting heterogeneous data making an easier task to design analysis tools and test biological and medical hypothesis on integrated data. One of the most evident case of such efforts is The Cancer Genome Atlas (TCGA), a data base that contains a large variety of information related to different types of cancer. This data base offers a great opportunity to those interested in performing analysis of integrated data; however, its exploitation is not so easy since non trivial efforts are required to extract and combine data before it could be analyzed in an integrated perspective. In this paper we present IRIS-TCGA, an online web service developed to perform multiple queries for data integration on TCGA. Differently from other tools that have been proposed to interact with TCGA, IRIS-TCGA allows a direct access to the data and enables to extract detailed combinations of subsets of the repository, according to filters and high-order queries. The structure of the system is simple, as it is built on two main operators, union and intersection, that are then used to construct queries of higher complexity. The first version of the system supports the extraction and integration of gene expression (RNA-sequencing, microarrays), DNA-methylation, and DNA-sequencing (mutations) data from experiments on tissues of patients, together with their related meta data, in a gene oriented organization. The extracted data matrices are particularly suited for data mining applications (e.g., classification). Finally, we show two application examples, where IRIS-TCGA is used for integrating genomic data from RNA-sequencing and DNA-methylation experiments, and where state-of-the-art bioinformatics analysis tools are applied to the integrated data in order to extract new knowledge from them. IRIS-TCGA is freely available at http://bioinf.iasi.cnr.it/iristcga/.
Year
Venue
Field
2016
CIBB
Data integration,Metadata,Information retrieval,Computer science,Bioinformatics analysis,Knowledge extraction,Bioinformatics,Web service,Design analysis
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Fabio Cumbo1154.12
Emanuel Weitschek28410.63
Paola Bertolazzi335232.81
Giovanni Felici420121.98