Title
Linked cancer genome atlas database
Abstract
The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional pilot project to create an atlas of genetic mutations responsible for cancer. One of the aims of this project is to develop an infrastructure for making the cancer related data publicly accessible, to enable cancer researchers anywhere around the world to make and validate important discoveries. However, data in the cancer genome atlas are organized as text archives in a set of directories. Devising bioinformatics applications to analyse such data is still challenging, as it requires downloading very large archives and parsing the relevant text files in order to collect the critical co-variates necessary for analysis. Furthermore, the various types of experimental results are not connected biologically, i.e. in order to truly exploit the data in the genome-wide context in which the TCGA project was devised, the data needs to be converted into a structured representation and made publicly available for remote querying and virtual integration. In this work, we address these issues by RDFizing data from TCGA and linking its elements to the Linked Open Data (LOD) Cloud. The outcome is the largest LOD data source (to the best of our knowledge) comprising of over 30 billion triples. This data source can be exploited through publicly available SPARQL endpoints, thus providing an easy-to-use, time-efficient, and scalable solution to accessing the Cancer Genome Atlas. We also describe showcases which are enabled by the new linked data representation of the Cancer Genome Atlas presented in this paper.
Year
DOI
Venue
2013
10.1145/2506182.2506200
I-SEMANTICS
Keywords
Field
DocType
cancer researcher,rdfizing data,linked cancer genome atlas,multi-institutional pilot project,largest lod data source,available sparql endpoint,tcga project,cancer genome atlas,large archives,data source,data representation,sparql,lod
Genome,Data mining,World Wide Web,Computer science,Upload,Linked data,Exploit,SPARQL,Parsing,Database,Cloud computing,Scalability
Conference
Citations 
PageRank 
References 
14
0.81
5
Authors
6
Name
Order
Citations
PageRank
Muhammad Saleem1140.81
Shanmukha S. Padmanabhuni2241.34
Axel-Cyrille Ngonga Ngomo31775139.40
Jonas S Almeida473142.25
Stefan Decker55799643.68
Helena F. Deus621013.23