Weakly-Supervised Convolutional Neural Network Architecture for Predicting Protein-DNA Binding. - Citegraph

Paper Info

Title
Weakly-Supervised Convolutional Neural Network Architecture for Predicting Protein-DNA Binding.

Abstract
Although convolutional neural networks (CNN) have outperformed conventional methods in predicting the sequence specificities of protein-DNA binding in recent years, they do not take full advantage of the intrinsic weakly-supervised information of DNA sequences that a bound sequence may contain multiple TFBS(s). Here, we propose a weakly-supervised convolutional neural network architecture (WSCNN), combining multiple-instance learning (MIL) with CNN, to further boost the performance of predicting protein-DNA binding. WSCNN first divides each DNA sequence into multiple overlapping subsequences (instances) with a sliding window, and then separately models each instance using CNN, and finally fuses the predicted scores of all instances in the same bag using four fusion methods, including <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Max</italic> , <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Average</italic> , <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Linear Regression</italic> , and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Top-Bottom Instances</italic> . The experimental results on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">in vivo</italic> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">in vitro</italic> datasets illustrate the performance of the proposed approach. Moreover, models built on <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">in vitro</italic> data using WSCNN can predict <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">in vivo</italic> protein-DNA binding with good accuracy. In addition, we give a quantitative analysis of the importance of the reverse-complement mode in predicting <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">in vivo</italic> protein-DNA binding, and explain why not directly use advanced pooling layers to combine MIL with CNN, through a series of experiments.

Year	DOI	Venue
2020	10.1109/TCBB.2018.2864203	IEEE/ACM transactions on computational biology and bioinformatics
Keywords	DocType	Volume
DNA,Proteins,Convolutional neural networks,Predictive models,In vivo,In vitro,Sequential analysis	Journal	17
Issue	ISSN	Citations
2	1545-5963	1
PageRank	References	Authors
0.35	0	4

Authors (4 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Qinhu Zhang	1	4	0.75
Lin Zhu	2	74	4.93
Wenzheng Bao	3	28	10.40
De-Shuang Huang	4	5532	357.50

1