Title | ||
---|---|---|
Weakly-Supervised Convolutional Neural Network Architecture for Predicting Protein-DNA Binding. |
Abstract | ||
---|---|---|
Although convolutional neural networks (CNN) have outperformed conventional methods in predicting the sequence specificities of protein-DNA binding in recent years, they do not take full advantage of the intrinsic weakly-supervised information of DNA sequences that a bound sequence may contain multiple TFBS(s). Here, we propose a weakly-supervised convolutional neural network architecture (WSCNN), combining multiple-instance learning (MIL) with CNN, to further boost the performance of predicting protein-DNA binding. WSCNN first divides each DNA sequence into multiple overlapping subsequences (instances) with a sliding window, and then separately models each instance using CNN, and finally fuses the predicted scores of all instances in the same bag using four fusion methods, including
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Max</italic>
,
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Average</italic>
,
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Linear Regression</italic>
, and
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Top-Bottom Instances</italic>
. The experimental results on
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">in vivo</italic>
and
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">in vitro</italic>
datasets illustrate the performance of the proposed approach. Moreover, models built on
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">in vitro</italic>
data using WSCNN can predict
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">in vivo</italic>
protein-DNA binding with good accuracy. In addition, we give a quantitative analysis of the importance of the reverse-complement mode in predicting
<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">in vivo</italic>
protein-DNA binding, and explain why not directly use advanced pooling layers to combine MIL with CNN, through a series of experiments. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/TCBB.2018.2864203 | IEEE/ACM transactions on computational biology and bioinformatics |
Keywords | DocType | Volume |
DNA,Proteins,Convolutional neural networks,Predictive models,In vivo,In vitro,Sequential analysis | Journal | 17 |
Issue | ISSN | Citations |
2 | 1545-5963 | 1 |
PageRank | References | Authors |
0.35 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Qinhu Zhang | 1 | 4 | 0.75 |
Lin Zhu | 2 | 74 | 4.93 |
Wenzheng Bao | 3 | 28 | 10.40 |
De-Shuang Huang | 4 | 5532 | 357.50 |