Title
A combinatorial perspective of the protein inference problem.
Abstract
In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from peptide identification results. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we devote ourselves to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound, and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain an analytical expression for protein inference. Our method achieves comparable results with ProteinProphet in a more efficient manner in experiments on two data sets of standard protein mixtures and two data sets of real samples. Based on our model, we study the impact of unique peptides and degenerate peptides (degenerate peptides are peptides shared by at least two proteins) on protein probabilities. Meanwhile, we also study the relationship between our model and ProteinProphet. We name our program ProteinInfer. Its Java source code, our supplementary document and experimental results are available at: >http://bioinformatics.ust.hk/proteininfer.
Year
DOI
Venue
2013
10.1109/TCBB.2013.110
IEEE/ACM Trans. Comput. Biology Bioinform.
Keywords
Field
DocType
peptide identification result,protein inference,peptide identification,protein probability,combinatorial perspective,standard protein mixture,protein inference problem,conditional protein probability,protein identification,probability,bioinformatics,proteins,java,proteomics,upper bound,estimation
Degenerate energy levels,Data set,Proteomics,Upper and lower bounds,Inference,Computer science,Peptide,ProteinProphet,Artificial intelligence,Bioinformatics,Shotgun proteomics,Machine learning
Journal
Volume
Issue
ISSN
10
6
1557-9964
Citations 
PageRank 
References 
1
0.36
3
Authors
3
Name
Order
Citations
PageRank
Chao Yang18722.49
Zengyou He299764.72
Weichuan Yu394357.38