Title
MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification.
Abstract
A MapReduce-based implementation called MR-MSPolygraph for parallelizing peptide identification from mass spectrometry data is presented. The underlying serial method, MSPolygraph, uses a novel hybrid approach to match an experimental spectrum against a combination of a protein sequence database and a spectral library. Our MapReduce implementation can run on any Hadoop cluster environment. Experimental results demonstrate that, relative to the serial version, MR-MSPolygraph reduces the time to solution from weeks to hours, for processing tens of thousands of experimental spectra. Speedup and other related performance studies are also reported on a 400-core Hadoop cluster using spectral datasets from environmental microbial communities as inputs.The source code along with user documentation are available on http://compbio.eecs.wsu.edu/MR-MSPolygraph.ananth@eecs.wsu.edu; william.cannon@pnnl.gov.Supplementary data are available at Bioinformatics online.
Year
DOI
Venue
2011
10.1093/bioinformatics/btr523
Bioinformatics
Keywords
Field
DocType
serial version,large-scale peptide identification,400-core hadoop cluster,spectral library,mapreduce implementation,mapreduce-based implementation,mass spectrometry data,experimental spectrum,hadoop cluster environment,hybrid spectral library-database search,spectral datasets,mass spectroscopy,mass spectrometry,spectrum,protein sequence,microbial community,database search
Data mining,Sequence database,Computer science,Software documentation,Source code,Database search engine,Software,Bioinformatics,Speedup
Journal
Volume
Issue
ISSN
27
21
1367-4811
Citations 
PageRank 
References 
12
0.79
1
Authors
4
Name
Order
Citations
PageRank
Kalyanaraman, Ananth122131.95
William R. Cannon26910.68
Benjamin Latt3120.79
Douglas J. Baxter4224.98