Title
Energy landscape analysis for regulatory RNA finding using scalable distributed cyberinfrastructure
Abstract
We investigate the folding energy landscape for a given RNA sequence through Boltzmann ensemble (BE) sampling of RNA secondary structures. The ensemble of sampled structures is used to derive distributions of energies and base-pair distances between two configurations. We identify structural features that can be utilized for RNA gene finding. Characterization of the EL through BE sampling of secondary structures is computationally demanding and has multiple heterogeneous stages. We develop the Distributed Adaptive Runtime Environment to effectively address the computational requirements. Distributed Adaptive Runtime Environment is built upon an extensible and interoperable pilot-job and supports the concurrent execution of a broad range of task sizes across a range of infrastructure. It is used to investigate two RNA systems of different sizes, S-adenosyl methionine (SAM) binding RNA sequences known as SAM-I riboswitches, and the S gene of the bovine corona virus RNA genome. We demonstrate how the implementation lowers the total time to solution for increases in RNA length, the number of sequences investigated, and the number of sampled structures. The distributions of energies and base-pair distances reveal variations in folding dynamics and pathways among the SAM riboswitch sequences. Our results for BCoV RNA genome sequences also indicate sensitivity of folding to coding-neutral variations in sequence. We search for a characteristic motif from within the SAM-I consensus structure – a four-way junction, among BE sampled structures for all 2910 SAM-I sequences identified from Rfam (the curated ncRNA family database). We find that BE sampling provides insight into the variations in conformational distribution among sequences of the same ncRNA family. Therefore, BE sampling of secondary structures is a viable pre-processing or post-processing tool to complement comparative sequence analysis. The understanding gained shows how appropriately designed cyberinfrastructure can provide new insight into RNA folding and structure formation. Copyright © 2011 John Wiley & Sons, Ltd.
Year
DOI
Venue
2011
10.1002/cpe.1796
Concurrency and Computation: Practice and Experience
Keywords
DocType
Volume
RNA gene finding,regulatory RNA finding,BCoV RNA genome,RNA sequence,RNA secondary structure,binding RNA,secondary structure,RNA folding,energy landscape analysis,RNA system,RNA length,RNA genome
Journal
23
Issue
ISSN
Citations 
17
1532-0626
3
PageRank 
References 
Authors
0.45
5
5
Name
Order
Citations
PageRank
Joohyun Kim129222.75
Wei Huang291.41
Sharath Maddineni3344.19
Fareed Aboul-ela43019.47
S. Jha57921539.19