Title | ||
---|---|---|
Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce. |
Abstract | ||
---|---|---|
Motivation: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data. Results: We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise. Availability and Implementation: Rail-RNA is available from http://rail. bio. Technical details on the Rail-dbGaP protocol as well as an implementation walkthrough are available at https://github. com/ nellore/rail-dbgap. Detailed instructions on running Rail-RNA on dbGaP-protected data using Amazon Web Services are available at http://docs. rail. bio/dbgap/. Contacts: anellore@gmail. com or langmea@ cs. jhu. edu Supplementary information: Supplementary data are available at Bioinformatics online. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1093/bioinformatics/btw177 | BIOINFORMATICS |
Field | DocType | Volume |
Software tool,Data mining,World Wide Web,Computer science,Amazon rainforest,Software,Bioinformatics,Amazon web services,Software walkthrough,Database,Cloud computing | Journal | 32 |
Issue | ISSN | Citations |
16 | 1367-4803 | 1 |
PageRank | References | Authors |
0.43 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Abhinav Nellore | 1 | 17 | 2.55 |
Christopher Wilks | 2 | 2 | 1.49 |
Kasper D. Hansen | 3 | 193 | 17.73 |
Jeffrey T. Leek | 4 | 148 | 13.79 |
Ben Langmead | 5 | 51 | 6.94 |