Title
Machine learning in computational biology to accelerate high-throughput protein expression.
Abstract
Motivation: The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40 000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molecular-level properties influencing expression and solubility. Results: Combining computational biology and machine learning identifies protein properties that hinder the HPA high-throughput antibody production pipeline. We predict protein expression and solubility with accuracies of 70% and 80%, respectively, based on a subset of key properties (aromaticity, hydropathy and isoelectric point). We guide the selection of protein fragments based on these characteristics to optimize high-throughput experimentation.
Year
DOI
Venue
2017
10.1093/bioinformatics/btx207
BIOINFORMATICS
Field
DocType
Volume
Computer science,Human Protein Atlas,Proteome,Artificial intelligence,Protein expression,Computational biology,Throughput,Bioinformatics,Workflow,Machine learning
Journal
33
Issue
ISSN
Citations 
16
1367-4803
0
PageRank 
References 
Authors
0.34
10
7
Name
Order
Citations
PageRank
Anand Sastry102.03
Jonathan Monk203.04
Hanna Tegel300.34
Mathias Uhlén471.62
Bernhard O. Palsson575167.99
Johan Rockberg600.34
Elizabeth Brunk721.41