Abstract | ||
---|---|---|
Protein identification is an important objective for proteomic and medical sciences as well as for pharmaceutical industry. With recent large-scale automation of genome sequencing and the explosion of protein databases, it is important to exploit latest data processing technologies and design highly scalable algorithms to speed up protein identification. In this study, we design, implement, and evaluate a new software tool, Bitmapped Mass Fingerprinting (BMF), that can efficiently construct a bitmap index for short peptides, and quickly identify candidate proteins from leading protein databases. BMF is developed by integrating the Fast Bit indexing technology and the popular Message Passing Interface (MPI) for parallelization. By exploiting Fast Bit for peptide mass fingerprinting across protein boundaries, we are able to accomplish parallel computation and I/O for a scalable implementation of protein identification. Our experimental results show that BMF brings dramatic performance improvement for protein identification from various protein databases. In particular, we demonstrate that BMF can effectively scale up to 8,192 cores on the Jaguar Supercomputer at Oak Ridge National Laboratory, achieving superb performance in identifying proteins from the National Center for Biotechnology Information (NCBI) non-redundant (NR) protein database. |
Year | DOI | Venue |
---|---|---|
2011 | 10.1109/CLUSTER.2011.11 | CLUSTER |
Keywords | Field | DocType |
software tool,protein identification,highly scalable algorithms,fast bit,pharmaceutical industry,short peptides,protein databases,national center for biotechnology information,genetics,fastbit,parallel computation,cray xt5,candidate protein,proteins,medical sciences,fast protein identification,large-scale automation,software tools,database indexing,bitmapped mass fingerprinting,message passing interface,biology computing,nonredundant protein database,fast bit indexing technology,oak ridge national laboratory,mpi,peptide mass fingerprinting,message passing,national center,protein boundary,protein database,various protein databases,genome sequencing,data processing technologies,proteomic sciences,jaguar supercomputer,microorganisms,amino acids,indexes,fingerprint recognition | Bitmap index,Computer science,Parallel computing,Search engine indexing,Message Passing Interface,Cray XT5,Database index,Peptide mass fingerprinting,Message passing,Scalability | Conference |
ISSN | ISBN | Citations |
1552-5244 | 978-0-7695-4516-5 | 3 |
PageRank | References | Authors |
0.46 | 10 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Weikuan Yu | 1 | 1042 | 77.40 |
K. John Wu | 2 | 3 | 0.46 |
Wei-Shinn Ku | 3 | 775 | 69.22 |
Cong Xu | 4 | 50 | 4.38 |
Juan Gao | 5 | 3 | 0.46 |