Title
BMF: Bitmapped Mass Fingerprinting for Fast Protein Identification
Abstract
Protein identification is an important objective for proteomic and medical sciences as well as for pharmaceutical industry. With recent large-scale automation of genome sequencing and the explosion of protein databases, it is important to exploit latest data processing technologies and design highly scalable algorithms to speed up protein identification. In this study, we design, implement, and evaluate a new software tool, Bitmapped Mass Fingerprinting (BMF), that can efficiently construct a bitmap index for short peptides, and quickly identify candidate proteins from leading protein databases. BMF is developed by integrating the Fast Bit indexing technology and the popular Message Passing Interface (MPI) for parallelization. By exploiting Fast Bit for peptide mass fingerprinting across protein boundaries, we are able to accomplish parallel computation and I/O for a scalable implementation of protein identification. Our experimental results show that BMF brings dramatic performance improvement for protein identification from various protein databases. In particular, we demonstrate that BMF can effectively scale up to 8,192 cores on the Jaguar Supercomputer at Oak Ridge National Laboratory, achieving superb performance in identifying proteins from the National Center for Biotechnology Information (NCBI) non-redundant (NR) protein database.
Year
DOI
Venue
2011
10.1109/CLUSTER.2011.11
CLUSTER
Keywords
Field
DocType
software tool,protein identification,highly scalable algorithms,fast bit,pharmaceutical industry,short peptides,protein databases,national center for biotechnology information,genetics,fastbit,parallel computation,cray xt5,candidate protein,proteins,medical sciences,fast protein identification,large-scale automation,software tools,database indexing,bitmapped mass fingerprinting,message passing interface,biology computing,nonredundant protein database,fast bit indexing technology,oak ridge national laboratory,mpi,peptide mass fingerprinting,message passing,national center,protein boundary,protein database,various protein databases,genome sequencing,data processing technologies,proteomic sciences,jaguar supercomputer,microorganisms,amino acids,indexes,fingerprint recognition
Bitmap index,Computer science,Parallel computing,Search engine indexing,Message Passing Interface,Cray XT5,Database index,Peptide mass fingerprinting,Message passing,Scalability
Conference
ISSN
ISBN
Citations 
1552-5244
978-0-7695-4516-5
3
PageRank 
References 
Authors
0.46
10
5
Name
Order
Citations
PageRank
Weikuan Yu1104277.40
K. John Wu230.46
Wei-Shinn Ku377569.22
Cong Xu4504.38
Juan Gao530.46