Title
Prediction of protein-RNA residue-base contacts using two-dimensional conditional random field with the lasso.
Abstract
To uncover molecular functions and networks in biological cellular systems, it is important to dissect interactions between proteins and RNAs. Many studies have been performed to investigate and analyze interactions between protein amino acid residues and RNA bases. In terms of interactions between residues in proteins, it is generally accepted that an amino acid residue at interacting sites has coevolved together with the partner residue in order to keep the interaction between residues in proteins. Based on this hypothesis, in our previous study to identify residue-residue contact pairs in interacting proteins, we made calculations of mutual information (M I) between amino acid residues from some multiple sequence alignment of homologous proteins, and combined it with a discriminative random field (DRF) approach, which is a special type of conditional random fields (CRFs) and has been proved useful for the purpose of extracting distinguishing areas from a photograph in the image processing field. Recently, the evolutionary correlation of interactions between residues and DNA bases has also been found in certain transcription factors and the DNA-binding sites.In this paper, we employ more generic two-dimensional CRFs than such DRFs to predict interactions between protein amino acid residues and RNA bases. In addition, we introduce labels representing kinds of amino acids and bases as local features of a CRF. Furthermore, we examine the utility of L1-norm regularization (lasso) for the CRF. For evaluation of our method, we use residue-base interactions between several Pfam domains and Rfam entries, conduct cross-validation, and calculate the average AUC (Area under ROC Curve) score. The results suggest that our CRF-based method using mutual information and labels with the lasso is useful for further improving the performance, especially provided that the features of CRF are successfully reduced by the lasso approach.We propose simple and generic two-dimensional CRF models using labels and mutual information with the lasso. Use of the CRF-based method in combination with the lasso is particularly useful for predicting the residue-base contacts in protein-RNA interactions.
Year
DOI
Venue
2013
10.1186/1752-0509-7-S2-S15
BMC systems biology
Keywords
Field
DocType
systems biology,bioinformatics,algorithms,biomedical research
Conditional random field,RNA,Random field,Binding site,Biology,Amino acid,Protein superfamily,Mutual information,Bioinformatics,Multiple sequence alignment
Journal
Volume
Issue
ISSN
7 Suppl 2
S-2
1752-0509
Citations 
PageRank 
References 
4
0.35
17
Authors
4
Name
Order
Citations
PageRank
Morihiro Hayashida115421.88
Mayumi Kamada2353.99
Jiangning Song337441.93
Tatsuya Akutsu42169216.05